5
\$\begingroup\$

Description

This script takes any domain input from STDIN and converts unicode domains into punycode.

Features

  • Any domains that throw an error get ignored.
  • When fed any ASCII domains, they just pass through.

convert.pl

#!/usr/bin/perl -Wn
use strict;
use Try::Tiny;
use Net::IDN::Encode ':all';
use open ':std', ':encoding(UTF-8)';
try {
 chomp $_;
 printf "%s\n",domain_to_ascii $_;
}

Sample Input:

дольщикиспб.рф
шляхтен.рф
สารสกัดจากสมุนไพร.com
google.com

Sample Output:

xn--90afmajeumr0f6a.xn--p1ai
xn--e1alhsoq4c.xn--p1ai
xn--12cau1c1a4atlh5dbe1gkg3hzj.com
google.com

I'm open to any feedback!

asked Jul 3, 2021 at 17:18
\$\endgroup\$
3
  • \$\begingroup\$ "I'm wondering if it would be more efficient to check if a domain is unicode or not..." The function domain_to_ascii already does that check, see the source line 46. \$\endgroup\$ Commented Jul 6, 2021 at 5:27
  • \$\begingroup\$ "Any domains that convert to punycode >255 characters and throw an error get ignored" Where did you get the number 255 from? I could not find that limit in the source. \$\endgroup\$ Commented Jul 6, 2021 at 5:39
  • \$\begingroup\$ @HåkonHægland Yeah I noticed that too. I read somewhere the standard is that punycode domains <255 characters long were valid; can't remember where though lol \$\endgroup\$ Commented Jul 6, 2021 at 5:47

1 Answer 1

3
\$\begingroup\$

Some small comments here: The program is using the shebang line:

#!/usr/bin/perl -Wn

The shebang is used when the script is run as a command from the Shell. In this case /usr/bin/perl is used to run the command. This is the so-called system perl that comes with a Unix-like operating system. However, it happens that a user installs other perl executables in addition to the system perl, for example using perlbrew. In this case, the user would like to run your script with his current choice of Perl interpreter. It might be the system perl or it could be a Perlbrew installed perl. Typically, the user arranges for the PATH environment variable to be set such that the Shell finds the correct perl. The same thing can be done with the shebang line by changing it to

#!/usr/bin/env perl

now the script is more portable since it can adapt to the current user's settings. However, there is one complication: It is not possible to pass arguments to perl in the shebang line when using /usr/bin/env. In your case you try to pass the options -Wn to perl, but it cannot be done in a portable way, see Why am I able to pass arguments to /usr/bin/env in this case?.

Luckily, it is seldom necessary to pass arguments to perl in the shebang line. Both -W and -n are better enabled from within the Perl script itself. Instead of passing -W to perl you could use the warnings pragma from within the Perl script. Similarly, the -n option is used to set up a STDIN read line-by-line-loop around your script, which can easily be implemented in the script itself.

Another thing that could help document your program (and thus make it easier to maintain) is to include some unit tests that describes the expected behavior of the program. For example:

p.pl:

#! /usr/bin/env perl
use feature qw(say);
use open ':std', ':encoding(UTF-8)';
use warnings;
use strict;
use Try::Tiny;
use Net::IDN::Encode 'domain_to_ascii';
# Written as a modulino: See Chapter 17 in "Mastering Perl". Executes main() if
# run as script, otherwise, if the file is imported from the test scripts,
# main() is not run.
main() unless caller;
sub main {
 while (<>) {
 my $line = parse_line($_);
 last if !defined $line;
 say $line;
 }
}
sub parse_line {
 my ($line) = @_;
 chomp $line;
 my $result = try {
 domain_to_ascii( $line );
 };
 return $result;
}

t/main.t:

use strict;
use warnings;
use utf8;
use open ':std', ':encoding(utf-8)';
use Test2::V0;
use lib '.';
require "p.pl";
{
 subtest "basic" => \&basic;
 subtest "fails" => \&fails;
 # TODO: Complete the test suite..
 done_testing;
}
sub basic {
 my @data = (['дольщикиспб.рф', 'xn--90afmajeumr0f6a.xn--p1ai'],
 ['สารสกัดจากสมุนไพร.com', 'xn--12cau1c1a4atlh5dbe1gkg3hzj.com'],
 ['шляхтен.рф', 'xn--e1alhsoq4c.xn--p1ai'],
 ['google.com', 'google.com']
 );
 my $i = 1;
 for my $item (@data) {
 my ($input, $output) = @$item;
 is(parse_line($input), $output, "basic $i");
 $i++;
 }
}
sub fails {
 is(parse_line("...."), U(), "empty label");
 is(parse_line("1234567890123456789012345678901234567890123456789012345678901234"), U(), "label too long (max 63 characters)");
}

You can run the tests like this:

$ prove t
t/main.t .. ok 
All tests successful.
Files=1, Tests=2, 0 wallclock secs ( 0.01 usr 0.00 sys + 0.07 cusr 0.01 csys = 0.09 CPU)
Result: PASS

or like this:

$ perl t/main.t 
# Seeded srand with seed '20210711' from local date.
ok 1 - basic {
 ok 1 - basic 1
 ok 2 - basic 2
 ok 3 - basic 3
 ok 4 - basic 4
 1..4
}
ok 2 - fails {
 ok 1 - empty label
 ok 2 - label too long (max 63 characters)
 1..2
}
1..2
answered Jul 11, 2021 at 20:15
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.