I want to optimize a Perl function which is frequently used in my application. The function creates a special datastructure from the results of DBI::fetchall_arrayref
which looks like:
$columns = ['COLNAME_1','COLNAME_2','COLNAME_3']
$rows = [ ['row_1_col_1', 'row_1_col_2', 'row_1_col_3'],
['row_2_col_1', 'row_2_col_2', 'row_2_col_3'],
['row_3_col_1', 'row_3_col_2', 'row_3_col_3']
];
The new datastructure must contain the data in the following form (all row-values for every column in a single arrayref)
$retval = {
row_count => 3,
col_count => 3,
COLNAME_1 => ['row_1_col_1', 'row_2_col_1', 'row_3_col_1' ],
COLNAME_2 => ['row_1_col_2', 'row_2_col_2', 'row_3_col_2' ],
COLNAME_3 => ['row_1_col_3', 'row_2_col_3', 'row_3_col_3' ]
}
The new datastructure is a Hash of Arrays and is used in the whole application. I cannot change the format (its too frequently used). I wrote a function for this conversion. I've already done some some performance optimization after profiling my application. But it's not enough. Now the function looks like:
sub reorganize($$) {
my ($self,$columns,$rows) = @_;
my $col_count = scalar(@$columns);
my $row_count = scalar(@$rows);
my $col_index = 0;
my $row_index = 0;
my $retval = { # new datastructure
row_count => $row_count,
col_count => $col_count
};
# iterate through all columns
for($col_index=0; $col_index<$col_count; $col_index++) {
# create a arrayref for all row-values of the current column
# set it to the correct size and assign all values to this arrayref
my $tmp = [];
$#{$tmp} = $row_count-1; # set size of array to the number of rows
# iterate through all rows
for($row_index=0; $row_index<$row_count; $row_index++) {
# assign values to arrayref (which has the correct size) instead of a "slow" push
$tmp->[$row_index] = $rows->[$row_index][$col_index];
}
# Assign the arrayref to the hash. The hash-key is the name of the column
$retval->{$columns->[$col_index]} = $tmp;
}
return $retval;
}
My Question:
Is there a way to further optimize this function (maybe using $[...])? I found some hints here at page 18 and 19, but I don't have any experience in using $ in different contexts.
I have to say that the function listed above is the best I can do. There may be other ways to do some optimization which I have never heard of.
1 Answer 1
The following code is about 35% faster (measured with Benchmark). The tricks:
no anonymous array created for
$tmp
.explicit
return
removed.variables created in place where their value is needed.
Some of the tricks added just a 3%, the first one seemed the most important. YMMV.
I experimented with $_
and map
s, too, but it seems the plain old C-style loop is the fastest.
sub faster {
my ($self, $columns, $rows) = @_;
my $retval = {
row_count => my $row_count = @$rows,
col_count => my $col_count = @$columns,
};
for (my $col_index = 0 ; $col_index < $col_count ; $col_index++) {
my $tmp;
for (my $row_index = 0 ; $row_index < $row_count ; $row_index++) {
$tmp->[$row_index] = $rows->[$row_index][$col_index];
}
$retval->{$columns->[$col_index]} = $tmp;
}
$retval
}
-
\$\begingroup\$ Thanks you very much. The first and second tricks are really interesting and from a c-style point of view very strange. i will try it with NYTProf asap. \$\endgroup\$some_coder– some_coder2014年03月11日 12:45:19 +00:00Commented Mar 11, 2014 at 12:45
-
\$\begingroup\$ @some_coder: Forget the C-style point of view when optimizing Perl :-) \$\endgroup\$choroba– choroba2014年03月11日 12:47:21 +00:00Commented Mar 11, 2014 at 12:47
-
\$\begingroup\$ In my benchmark (yay!) I've observed foreach loop was faster (and, what's even more important, more readable), so please change C-style fors to
for my $col_index (0 .. $col_count - 1)
. \$\endgroup\$Grzegorz Rożniecki– Grzegorz Rożniecki2014年03月11日 13:24:42 +00:00Commented Mar 11, 2014 at 13:24 -
\$\begingroup\$ @Xaerxess: In my Benchmark, switching to this style loop was slower. \$\endgroup\$choroba– choroba2014年03月11日 13:26:01 +00:00Commented Mar 11, 2014 at 13:26
-
\$\begingroup\$ @choroba I guess it depends on how many iterations you're doing - see this gist, for-c outperforms foreach only up to 4-8 elements to iterate over. Still, in terms of readability, foreach wins, and that said I wish I'd never have to optimize fors. \$\endgroup\$Grzegorz Rożniecki– Grzegorz Rożniecki2014年03月11日 13:55:47 +00:00Commented Mar 11, 2014 at 13:55
$self
) you are using the subroutine as a method. You can remove the prototypes($$)
as they are ignored in method calls anyway. \$\endgroup\$