Perl Unicode Cookbook: Unicode Locale Collation

℞ 37: Unicode locale collation

As you've already seen, Unicode-aware sorting respects Unicode character properties. You can't sort by codepoint and expect to get accurate results, not even if you stick with pure ASCII.

The world is a complicated place. Some locales have their own special sorting rules.

The module Unicode::Collate::Locale provides a sort() method which supports locale-specific rules:

 use Unicode::Collate::Locale;

 my $col  = Unicode::Collate::Locale->new(locale => "de__phonebook");
 my @list = $col->sort(@old_list);

This module is part of the Perl 5 core distribution as of Perl 5.12. If you're using an older version of Perl, install the Unicode::Collate distribution to take advantage of it.

The ucsort program mentioned in Perl Unicode recipe 35 accepts a --locale parameter.

Previous: ℞ 36: Case- and Accent-insensitive Sorting

Series Index: The Standard Preamble

Next: ℞ 38: Make cmp Work on Text instead of Codepoints

Visit the home of the Perl programming language: Perl.org

Sponsored by

Monthly Archives

Powered by Movable Type 5.13-en