Description
This script takes any domain input from STDIN and converts unicode domains into punycode.
Features
- Any domains that throw an error get ignored.
- When fed any ASCII domains, they just pass through.
convert.pl
#!/usr/bin/perl -Wn
use strict;
use Try::Tiny;
use Net::IDN::Encode ':all';
use open ':std', ':encoding(UTF-8)';
try {
chomp $_;
printf "%s\n",domain_to_ascii $_;
}
Sample Input:
дольщикиспб.рф
шляхтен.рф
สารสกัดจากสมุนไพร.com
google.com
Sample Output:
xn--90afmajeumr0f6a.xn--p1ai
xn--e1alhsoq4c.xn--p1ai
xn--12cau1c1a4atlh5dbe1gkg3hzj.com
google.com
I'm open to any feedback!
1 Answer 1
Some small comments here: The program is using the shebang line:
#!/usr/bin/perl -Wn
The shebang is used when the script is run as a command from the Shell. In this case /usr/bin/perl
is used to run the command. This is the so-called system perl
that comes with a Unix-like operating system. However, it happens that a user installs other perl
executables in addition to the system perl
, for example using perlbrew
. In this case, the user would like to run your script with his current choice of Perl interpreter. It might be the system perl
or it could be a Perlbrew installed perl
. Typically, the user arranges for the PATH
environment variable to be set such that the Shell finds the correct perl
. The same thing can be done with the shebang line by changing it to
#!/usr/bin/env perl
now the script is more portable since it can adapt to the current user's settings. However, there is one complication: It is not possible to pass arguments to perl
in the shebang line when using /usr/bin/env
. In your case you try to pass the options -Wn
to perl
, but it cannot be done in a portable way, see
Why am I able to pass arguments to /usr/bin/env in this case?.
Luckily, it is seldom necessary to pass arguments to perl
in the shebang line. Both -W
and -n
are better enabled from within the Perl script itself. Instead of passing -W
to perl
you could use the warnings
pragma from within the Perl script. Similarly, the -n
option is used to set up a STDIN
read line-by-line-loop around your script, which can easily be implemented in the script itself.
Another thing that could help document your program (and thus make it easier to maintain) is to include some unit tests that describes the expected behavior of the program. For example:
p.pl:
#! /usr/bin/env perl
use feature qw(say);
use open ':std', ':encoding(UTF-8)';
use warnings;
use strict;
use Try::Tiny;
use Net::IDN::Encode 'domain_to_ascii';
# Written as a modulino: See Chapter 17 in "Mastering Perl". Executes main() if
# run as script, otherwise, if the file is imported from the test scripts,
# main() is not run.
main() unless caller;
sub main {
while (<>) {
my $line = parse_line($_);
last if !defined $line;
say $line;
}
}
sub parse_line {
my ($line) = @_;
chomp $line;
my $result = try {
domain_to_ascii( $line );
};
return $result;
}
t/main.t:
use strict;
use warnings;
use utf8;
use open ':std', ':encoding(utf-8)';
use Test2::V0;
use lib '.';
require "p.pl";
{
subtest "basic" => \&basic;
subtest "fails" => \&fails;
# TODO: Complete the test suite..
done_testing;
}
sub basic {
my @data = (['дольщикиспб.рф', 'xn--90afmajeumr0f6a.xn--p1ai'],
['สารสกัดจากสมุนไพร.com', 'xn--12cau1c1a4atlh5dbe1gkg3hzj.com'],
['шляхтен.рф', 'xn--e1alhsoq4c.xn--p1ai'],
['google.com', 'google.com']
);
my $i = 1;
for my $item (@data) {
my ($input, $output) = @$item;
is(parse_line($input), $output, "basic $i");
$i++;
}
}
sub fails {
is(parse_line("...."), U(), "empty label");
is(parse_line("1234567890123456789012345678901234567890123456789012345678901234"), U(), "label too long (max 63 characters)");
}
You can run the tests like this:
$ prove t
t/main.t .. ok
All tests successful.
Files=1, Tests=2, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.07 cusr 0.01 csys = 0.09 CPU)
Result: PASS
or like this:
$ perl t/main.t
# Seeded srand with seed '20210711' from local date.
ok 1 - basic {
ok 1 - basic 1
ok 2 - basic 2
ok 3 - basic 3
ok 4 - basic 4
1..4
}
ok 2 - fails {
ok 1 - empty label
ok 2 - label too long (max 63 characters)
1..2
}
1..2
domain_to_ascii
already does that check, see the source line 46. \$\endgroup\$