Perl Unicode Cookbook: Unicode Literals by Number

℞ 5: Unicode literals by character number

In an interpolated literal, whether a double-quoted string or a regex, you may specify a character by its number using the \x{HHHHHH} escape.

 String: "\x{3a3}"
 Regex:  /\x{3a3}/

 String: "\x{1d45b}"
 Regex:  /\x{1d45b}/

 # even non-BMP ranges in regex work fine
 /[\x{1D434}-\x{1D467}]/

The BMP (or Basic Multilingual Plane, or Plane 0) contains the most common Unicode characters; it covers 0x0000 through 0xFFFD. Characters in other planes are much more specialized. They often include characters of historical interest.

Use Unicode charts to find character numbers, or see the recipe for translating characters to numbers and vice versa.

Previous: ℞ 4: Characters and Their Numbers

Series Index: The Standard Preamble

Next: ℞ 6: Get Character Names by Number

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en