Perl Unicode Cookbook: Convert non-ASCII Unicode Numerics

℞ 28: Convert non-ASCII Unicode numerics

Unicode digits encompass far more than the ASCII characters 0 - 9.

Unless you’ve used /a or /aa, \d matches more than ASCII digits only. That’s good! Unfortunately, Perl’s implicit string-to-number conversion does not currently recognize Unicode digits. Here’s how to convert such strings manually.

As usual, the Unicode::UCD module provides access to the Unicode character database. Its num() function can numify Unicode digits—and strings of Unicode digits.

 use v5.14;  # needed for num() function
 use Unicode::UCD qw(num);
 my $str = "got Ⅻ and ४५६७ and ⅞ and here";
 my @nums = ();
 while (/$str =~ (\d+|\N)/g) {  # not just ASCII!
    push @nums, num($1);
 }
 say "@nums";   #     12      4567      0.875

 use charnames qw(:full);
 my $nv = num("\N{RUMI DIGIT ONE}\N{RUMI DIGIT TWO}");

As num()’s documentation warns, the function errs on the side of safety. Not all collections of Unicode digits form valid numbers. As well, you may consider normalizing complex Unicode strings before performing numification.

Previous: ℞ 27: Unicode Normalization

Series Index: The Standard Preamble

Next: ℞ 29: Match Unicode Grapheme Cluster in Regex

Tags

Feedback

Something wrong with this article? Help us out by opening an issue or pull request on GitHub