Perl Unicode Cookbook: Further Resources
This series has shown you several features of Unicode by example, as well as several techniques for working with Unicode correctly and easily with recent releases of Perl 5. By now you know more than many programmers do about Unicode, but your journey to mastery continues.
Perl 5 includes several pieces of documentation which explain Unicode and Perl’s Unicode support. See perlunicode, perluniprops, perlre, perlrecharclass, perluniintro, perlunitut and perlunifaq.
Perl 5 and the CPAN provide several modules and distributions to allow the effective use of Unicode. As of Perl 5.16, many of these are in the core library. Many of them work just as well with earlier versions of Perl 5, though for the best and most correct support for Unicode as a whole, consider using Perl 5.14 or 5.16.
These modules include:
The CPAN distribution
Unicode::Tussle module includes many command-line programs to help with working with Unicode, including these programs to fully or partly replace standard utilities: tcgrep instead of egrep, uniquote instead of cat -v or hexdump, uniwc instead of wc, unilook instead of look, unifmt instead of fmt, and ucsort instead of sort. For exploring Unicode character names and character properties, see its uniprops, unichars, and uninames programs. It also supplies these programs, all of which are general ﬁlters that do Unicode-y things: unititle and unicaps; uniwide and uninarrow; unisupers and unisubs; nfd, nfc, nfkd, and nfkc; and uc, lc, and tc.
Finally, see the published Unicode Standard (page numbers are from version 6.0.0), including these speciﬁc annexes and technical reports:
- §3.13 Default Case Algorithms, page 113
- §4.2 Case, pages 120-122
- Case Mappings, page 166-172, especially Caseless Matching starting on page 170
- UAX #44: Unicode Character Database
- UTS #18: Unicode Regular Expressions
- UAX #15: Unicode Normalization Forms
- UTS #10: Unicode Collation Algorithm
- UAX #29: Unicode Text Segmentation
- UAX #14: Unicode Line Breaking Algorithm
- UAX #11: East Asian Width
Tom Christiansen <email@example.com> wrote this series, with occasional kibbitzing from Larry Wall and Jeﬀrey Friedl in the background.
Most of these examples came from the current edition of the “Camel Book”; that is, from the 4th Edition of Programming Perl, Copyright © 2012 Tom Christiansen et al., 2012-02-13 by O’Reilly Media. The code itself is freely redistributable, and you are encouraged to transplant, fold, spindle, and mutilate any of the examples in this series however you please for inclusion into your own programs without any encumbrance whatsoever. Acknowledgement via code comment is polite but not required.
Previous: ℞ 44: Demo of Unicode Collation and Printing
Series Index: The Standard Preamble
Something wrong with this article? Help us out by opening an issue or pull request on GitHub