The Perl Conference in Pittsburgh Banner

Perl Unicode Cookbook: Unicode Literals by Number

℞ 5: Unicode literals by character number

In an interpolated literal, whether a double-quoted string or a regex, you may specify a character by its number using the \x{HHHHHH} escape.

 String: "\x{3a3}"
 Regex:  /\x{3a3}/

 String: "\x{1d45b}"
 Regex:  /\x{1d45b}/

 # even non-BMP ranges in regex work fine
 /[\x{1D434}-\x{1D467}]/

The BMP (or Basic Multilingual Plane, or Plane 0) contains the most common Unicode characters; it covers 0x0000 through 0xFFFD. Characters in other planes are much more specialized. They often include characters of historical interest.

Use Unicode charts to find character numbers, or see the recipe for translating characters to numbers and vice versa.

Previous: ℞ 4: Characters and Their Numbers

Series Index: The Standard Preamble

Next: ℞ 6: Get Character Names by Number

Tags

Feedback

Something wrong with this article? Help us out by opening an issue or pull request on GitHub