℞ 21: Unicode case-insensitive comparisons
Unicode is more than an expanded character set. Unicode is a set of rules about how characters behave and a set of properties about each character.
Comparing strings for equivalence often requires normalizing them to a
standard form. That normalized form often requires that all characters be in a
specific case. ℞ 20:
Unicode casing demonstrated that converting between upper- and lower-case
Unicode characters is more complicated than simply mapping [A-Z]
to [a-z]. (Remember also that many characters have a title case
form!)
The proper solution for normalized comparisons is to perform casefolding
instead of mapping a subset of some characters to another. Perl 5.16 added a
new feature /i pattern modifier has always provided. This feature is available
for other Perls thanks to the CPAN module Unicode::CaseFold:
use feature "fc"; # fc() function is from v5.16
# OR
use Unicode::CaseFold;
# sort case-insensitively
my @sorted = sort { fc($a) cmp fc($b) } @list;
# both are true:
fc("tschüß") eq fc("TSCHÜSS")
fc("Σίσυφος") eq fc("ΣΊΣΥΦΟΣ")
Fold cases properly goes into more detail about case folding in Perl.
Previous: ℞ 20: Unicode Casing
Series Index: The Standard Preamble

