1

I've file file1.pl:

use strict;
use warnings;
use Encode;
my @flist = `svn diff --summarize ...`;
foreach my $file (@flist) {
 my $foo = "$one/$file";
 use bytes;
 print(bytes::length($one)."\n");
 print(bytes::length($file)."\n");
 print(bytes::length($foo)."\n");
}
# 76
# 31
# 108

and file2.pl with the same main logic. But in file2.pl the output is:

# 76
# 31
# 110 <-- ?

Both files have the same encoding (ISO-8859-1). For the same result as in file1.pl I've to use

my $foo = "$one/".decode('UTF-8', $file);

in file2.pl. What could be the reason for that difference or the requirement of decode('UTF-8', $file) in file2.pl? Seems to be related to What if I don't decode? but in which manner and only in file2.pl? Thx.

Perl v5.10.1

asked Jun 20, 2022 at 13:07
2
  • 1
    You don't appear to have told us in what way file2.pl differs from file1.pl? Or are you saying that they're the same apart from the "my $foo = ..." line? Commented Jun 20, 2022 at 14:26
  • Sorry, very difficult zu explain. file2.pl is a large file of about 1700 lines of code. file1.pl is the extraction of the relevant logic of file2.pl for debugging purposes. But I cannot find the reason why the encoding in file1.pl works in other way than in file2.pl. The concatenation of $one/$file works obviously different in file1.pl and file2.pl regarding to internal encoding. Commented Jun 20, 2022 at 15:45

1 Answer 1

3

Don't use bytes.

Use of this module for anything other than debugging purposes is strongly discouraged.

bytes::length gets the length of the internal storage of a string. It's useless.


What could be the reason for that difference

$one and $file contained strings stored using different internal storage formats. One needed to be converted for a concatenation to occur.

use strict;
use warnings;
use feature qw( say );
use bytes qw( );
use Encode qw( encode );
sub dump_lengths {
 my $s = shift;
 say
 join " ",
 length( $s ),
 length( encode( "UTF-8", $s ) ),
 bytes::length( $s );
}
 # +------ Length of string
my $x = chr( 0xE9 ); # | +---- Length of its UTF-8 encoding
my $y = chr( 0x2660 ); # | | +-- Length of internal storage
 # | | |
dump_lengths( $x ); # 1 2 1
dump_lengths( $y ); # 1 3 3
my $z = $x . $y;
dump_lengths( $z ); # 2 5 5
answered Jun 20, 2022 at 17:18
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.