Perl Unicode Cookbook: Unicode Named Character Sequences

℞ 9: Unicode named sequences

Unicode includes the feature of named character sequences, which combine multiple Unicode characters behind a single name. The charnames pragma allows the use of these named sequences in literals, just as it allows the use of Unicode named characters in literals.

In Perl, these named character sequences look just like character names but return multiple codepoints. Notice the %vx vector-print behavior of printf:

use charnames qw(:full);
my $seq = "\N{LATIN CAPITAL LETTER A WITH MACRON AND GRAVE}";
printf "U+%v04X\n", $seq;
U+0100.0300

While each version of Unicode may update the official list of named sequences, the latest version of the Unicode Named Sequences data file is always available. Perl 5.14 supports Unicode 6.0, and Perl 5.16 will support Unicode 6.1.

Previous: ℞ 8: Unicode Named Characters

Series Index: The Standard Preamble

Next: ℞ 10: Custom Named Characters

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en