Perl Unicode Cookbook: Unicode Named Character Sequences

℞ 9: Unicode named sequences

Unicode includes the feature of named character sequences, which combine multiple Unicode characters behind a single name. The charnames pragma allows the use of these named sequences in literals, just as it allows the use of Unicode named characters in literals.

In Perl, these named character sequences look just like character names but return multiple codepoints. Notice the %vx vector-print behavior of printf:

use charnames qw(:full);
my $seq = "\N{LATIN CAPITAL LETTER A WITH MACRON AND GRAVE}";
printf "U+%v04X\n", $seq;
U+0100.0300

While each version of Unicode may update the official list of named sequences, the latest version of the Unicode Named Sequences data file is always available. Perl 5.14 supports Unicode 6.0, and Perl 5.16 will support Unicode 6.1.

Previous: ℞ 8: Unicode Named Characters

Series Index: The Standard Preamble

Next: ℞ 10: Custom Named Characters

Tags

Feedback

Something wrong with this article? Help us out by opening an issue or pull request on GitHub