℞ 25: Match Unicode properties in regex with \p, \P
Every Unicode codepoint has one or more properties, indicating the rules
which apply to that codepoint. Perl's regex engine is aware of these
properties; use the \p{} metacharacter sequence to match a
codepoint possessing that property and its inverse, \P{} to match
a codepoint lacking that property.
Each property has a short name and a long name. For example, to match any
codepoint which has the Letter property, you may use
\p{Letter} or \p{L}. Similarly, you may use
\P{Uppercase} or \P{Upper}. perldoc
perlunicode's "Unicode Character Properties" section describes these
properties in greater detail.
Examples of these properties useful in regex include:
\pL, \pN, \pS, \pP, \pM, \pZ, \pC
\p{Sk}, \p{Ps}, \p{Lt}
\p{alpha}, \p{upper}, \p{lower}
\p{Latin}, \p{Greek}
\p{script=Latin}, \p{script=Greek}
\p{East_Asian_Width=Wide}, \p{EA=W}
\p{Line_Break=Hyphen}, \p{LB=HY}
\p{Numeric_Value=4}, \p{NV=4}
Previous: ℞ 24: Disable Unicode-awareness in Builtin Character Classes
Series Index: The Standard Preamble

