Perl Unicode Cookbook: Case- and Accent-insensitive Comparison

℞ 39: Case- and accent-insensitive comparisons

As you've noticed by now, many Unicode strings have multiple possible representations. Comparing two Unicode strings for equality requires far more than merely comparing their codepoints. Not only must you account for multiple representations, you must decide which types of differences are significant: do you care about the case of individual characters? How about the presence or absence of accents?

Use a collator object to compare Unicode text by character instead of by codepoint. To perform comparisions without regard for case or accent differences, choose the appropriate comparison level. Unicode::Collate's eq() method offers customizable Unicode-aware equality:

 use Unicode::Collate;
 my $es = Unicode::Collate->new(
     level         => 1,
     normalization => undef
 );

  # now both are true:
 $es->eq("García",  "GARCIA" );
 $es->eq("Márquez", "MARQUEZ");

Previous: ℞ 38: Make cmp Work on Text instead of Codepoints

Series Index: The Standard Preamble

Next: ℞ 40: Case- and Accent-insensitive Locale Comparisons

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en