Perl Unicode Cookbook: Reverse String by Grapheme
℞ 32: Reverse string by grapheme
Because bytes and characters are not isomorphic in Unicode—and what you may see as a user-visible character (a grapheme) is not necessarily a single codepoint in a Unicode string—every string operation must be aware of the difference between codepoints and graphemes.
Consider the Perl builtin reverse
. Reversing a string by codepoints messes up diacritics, mistakenly converting crème brûlée into éel̂urb em̀erc instead of into eélûrb emèrc; so reverse by grapheme instead.
As one option, use Perl’s \X
regex metacharacter to extract graphemes from a string, then reverse that list:
$str = join("", reverse $str =~ /\X/g);
As another option, use Unicode::GCString to treat a string as a sequence of graphemes, not codepoints:
use Unicode::GCString;
$str = reverse Unicode::GCString->new($str);
Both these approaches work correctly no matter what normalization the string is in. Remember that \X
is most reliable only as of and after Perl 5.12.
Previous: ℞ 31: Extract by Grapheme Instead of Codepoint (substr)
Series Index: The Standard Preamble
Tags
Feedback
Something wrong with this article? Help us out by opening an issue or pull request on GitHub