4
\$\begingroup\$

I want to optimize a Perl function which is frequently used in my application. The function creates a special datastructure from the results of DBI::fetchall_arrayref which looks like:

$columns = ['COLNAME_1','COLNAME_2','COLNAME_3']
$rows = [ ['row_1_col_1', 'row_1_col_2', 'row_1_col_3'],
 ['row_2_col_1', 'row_2_col_2', 'row_2_col_3'],
 ['row_3_col_1', 'row_3_col_2', 'row_3_col_3']
];

The new datastructure must contain the data in the following form (all row-values for every column in a single arrayref)

$retval = { 
 row_count => 3,
 col_count => 3,
 COLNAME_1 => ['row_1_col_1', 'row_2_col_1', 'row_3_col_1' ],
 COLNAME_2 => ['row_1_col_2', 'row_2_col_2', 'row_3_col_2' ],
 COLNAME_3 => ['row_1_col_3', 'row_2_col_3', 'row_3_col_3' ]
}

The new datastructure is a Hash of Arrays and is used in the whole application. I cannot change the format (its too frequently used). I wrote a function for this conversion. I've already done some some performance optimization after profiling my application. But it's not enough. Now the function looks like:

sub reorganize($$) {
 my ($self,$columns,$rows) = @_;
 my $col_count = scalar(@$columns);
 my $row_count = scalar(@$rows);
 my $col_index = 0;
 my $row_index = 0;
 my $retval = { # new datastructure
 row_count => $row_count,
 col_count => $col_count 
 };
 # iterate through all columns
 for($col_index=0; $col_index<$col_count; $col_index++) {
 # create a arrayref for all row-values of the current column
 # set it to the correct size and assign all values to this arrayref
 my $tmp = [];
 $#{$tmp} = $row_count-1; # set size of array to the number of rows
 # iterate through all rows
 for($row_index=0; $row_index<$row_count; $row_index++) { 
 # assign values to arrayref (which has the correct size) instead of a "slow" push
 $tmp->[$row_index] = $rows->[$row_index][$col_index]; 
 }
 # Assign the arrayref to the hash. The hash-key is the name of the column
 $retval->{$columns->[$col_index]} = $tmp;
 }
 return $retval;
}

My Question:

Is there a way to further optimize this function (maybe using $[...])? I found some hints here at page 18 and 19, but I don't have any experience in using $ in different contexts.

I have to say that the function listed above is the best I can do. There may be other ways to do some optimization which I have never heard of.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 11, 2014 at 11:55
\$\endgroup\$
1
  • \$\begingroup\$ Just a note: It seems ($self) you are using the subroutine as a method. You can remove the prototypes ($$) as they are ignored in method calls anyway. \$\endgroup\$ Commented Mar 11, 2014 at 12:12

1 Answer 1

3
\$\begingroup\$

The following code is about 35% faster (measured with Benchmark). The tricks:

  • no anonymous array created for $tmp.

  • explicit return removed.

  • variables created in place where their value is needed.

Some of the tricks added just a 3%, the first one seemed the most important. YMMV.

I experimented with $_ and maps, too, but it seems the plain old C-style loop is the fastest.

sub faster {
 my ($self, $columns, $rows) = @_;
 my $retval = {
 row_count => my $row_count = @$rows,
 col_count => my $col_count = @$columns,
 };
 for (my $col_index = 0 ; $col_index < $col_count ; $col_index++) {
 my $tmp;
 for (my $row_index = 0 ; $row_index < $row_count ; $row_index++) {
 $tmp->[$row_index] = $rows->[$row_index][$col_index];
 }
 $retval->{$columns->[$col_index]} = $tmp;
 }
 $retval
}
answered Mar 11, 2014 at 12:35
\$\endgroup\$
5
  • \$\begingroup\$ Thanks you very much. The first and second tricks are really interesting and from a c-style point of view very strange. i will try it with NYTProf asap. \$\endgroup\$ Commented Mar 11, 2014 at 12:45
  • \$\begingroup\$ @some_coder: Forget the C-style point of view when optimizing Perl :-) \$\endgroup\$ Commented Mar 11, 2014 at 12:47
  • \$\begingroup\$ In my benchmark (yay!) I've observed foreach loop was faster (and, what's even more important, more readable), so please change C-style fors to for my $col_index (0 .. $col_count - 1). \$\endgroup\$ Commented Mar 11, 2014 at 13:24
  • \$\begingroup\$ @Xaerxess: In my Benchmark, switching to this style loop was slower. \$\endgroup\$ Commented Mar 11, 2014 at 13:26
  • \$\begingroup\$ @choroba I guess it depends on how many iterations you're doing - see this gist, for-c outperforms foreach only up to 4-8 elements to iterate over. Still, in terms of readability, foreach wins, and that said I wish I'd never have to optimize fors. \$\endgroup\$ Commented Mar 11, 2014 at 13:55

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.