Encode - String bytes length

Question 1

I've file file1.pl:

use strict;
use warnings;
use Encode;
my @flist = `svn diff --summarize ...`;
foreach my $file (@flist) {
 my $foo = "$one/$file";
 use bytes;
 print(bytes::length($one)."\n");
 print(bytes::length($file)."\n");
 print(bytes::length($foo)."\n");
}
# 76
# 31
# 108

and file2.pl with the same main logic. But in file2.pl the output is:

# 76
# 31
# 110 <-- ?

Both files have the same encoding (ISO-8859-1). For the same result as in file1.pl I've to use

my $foo = "$one/".decode('UTF-8', $file);

in file2.pl. What could be the reason for that difference or the requirement of decode('UTF-8', $file) in file2.pl? Seems to be related to What if I don't decode? but in which manner and only in file2.pl? Thx.

Perl v5.10.1

Question 2

You don't appear to have told us in what way file2.pl differs from file1.pl? Or are you saying that they're the same apart from the "my $foo = ..." line?

Question 3

Sorry, very difficult zu explain. file2.pl is a large file of about 1700 lines of code. file1.pl is the extraction of the relevant logic of file2.pl for debugging purposes. But I cannot find the reason why the encoding in file1.pl works in other way than in file2.pl. The concatenation of $one/$file works obviously different in file1.pl and file2.pl regarding to internal encoding.

Question 4

Don't use bytes.

Use of this module for anything other than debugging purposes is strongly discouraged.

bytes::length gets the length of the internal storage of a string. It's useless.

What could be the reason for that difference

$one and $file contained strings stored using different internal storage formats. One needed to be converted for a concatenation to occur.

use strict;
use warnings;
use feature qw( say );
use bytes qw( );
use Encode qw( encode );
sub dump_lengths {
 my $s = shift;
 say
 join " ",
 length( $s ),
 length( encode( "UTF-8", $s ) ),
 bytes::length( $s );
}
 # +------ Length of string
my $x = chr( 0xE9 ); # | +---- Length of its UTF-8 encoding
my $y = chr( 0x2660 ); # | | +-- Length of internal storage
 # | | |
dump_lengths( $x ); # 1 2 1
dump_lengths( $y ); # 1 3 3
my $z = $x . $y;
dump_lengths( $z ); # 2 5 5

ikegami 392k17 gold badges291 silver badges555 bronze badges · Accepted Answer · 2022-06-20 17:18:19Z

Don't use bytes.

Use of this module for anything other than debugging purposes is strongly discouraged.

bytes::length gets the length of the internal storage of a string. It's useless.

What could be the reason for that difference

$one and $file contained strings stored using different internal storage formats. One needed to be converted for a concatenation to occur.

use strict;
use warnings;
use feature qw( say );
use bytes qw( );
use Encode qw( encode );
sub dump_lengths {
 my $s = shift;
 say
 join " ",
 length( $s ),
 length( encode( "UTF-8", $s ) ),
 bytes::length( $s );
}
 # +------ Length of string
my $x = chr( 0xE9 ); # | +---- Length of its UTF-8 encoding
my $y = chr( 0x2660 ); # | | +-- Length of internal storage
 # | | |
dump_lengths( $x ); # 1 2 1
dump_lengths( $y ); # 1 3 3
my $z = $x . $y;
dump_lengths( $z ); # 2 5 5

CollectivesTM on Stack Overflow

Encode - String bytes length

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related