Perl Unicode Cookbook: Convert non-ASCII Unicode Numerics
May 21, 2012 by Tom Christiansen
℞ 28: Convert non-ASCII Unicode numerics
Unicode digits encompass far more than the ASCII characters 0 - 9.
Unless you’ve used /a
or /aa
, \d
matches more than ASCII digits only. That’s good! Unfortunately, Perl’s implicit string-to-number conversion does not currently recognize Unicode digits. Here’s how to convert such strings manually.
As usual, the Unicode::UCD module provides access to the Unicode character database. Its num()
function can numify Unicode digits—and strings of Unicode digits.
use v5.14; # needed for num() function
use Unicode::UCD qw(num);
my $str = "got XII and ४५६७ and 7⁄8 and here";
my @nums = ();
while (/$str =~ (\d+|\N)/g) { # not just ASCII!
push @nums, num(1ドル);
}
say "@nums"; # 12 4567 0.875
use charnames qw(:full);
my $nv = num("\N{RUMI DIGIT ONE}\N{RUMI DIGIT TWO}");
As num()
’s documentation warns, the function errs on the side of safety. Not all collections of Unicode digits form valid numbers. As well, you may consider normalizing complex Unicode strings before performing numification.
Previous: ℞ 27: Unicode Normalization
Series Index: The Standard Preamble