Performance optimization in function for datastructure mapping

Question 1

I want to optimize a Perl function which is frequently used in my application. The function creates a special datastructure from the results of DBI::fetchall_arrayref which looks like:

$columns = ['COLNAME_1','COLNAME_2','COLNAME_3']
$rows = [ ['row_1_col_1', 'row_1_col_2', 'row_1_col_3'],
 ['row_2_col_1', 'row_2_col_2', 'row_2_col_3'],
 ['row_3_col_1', 'row_3_col_2', 'row_3_col_3']
];

The new datastructure must contain the data in the following form (all row-values for every column in a single arrayref)

$retval = { 
 row_count => 3,
 col_count => 3,
 COLNAME_1 => ['row_1_col_1', 'row_2_col_1', 'row_3_col_1' ],
 COLNAME_2 => ['row_1_col_2', 'row_2_col_2', 'row_3_col_2' ],
 COLNAME_3 => ['row_1_col_3', 'row_2_col_3', 'row_3_col_3' ]
}

The new datastructure is a Hash of Arrays and is used in the whole application. I cannot change the format (its too frequently used). I wrote a function for this conversion. I've already done some some performance optimization after profiling my application. But it's not enough. Now the function looks like:

sub reorganize($$) {
 my ($self,$columns,$rows) = @_;
 my $col_count = scalar(@$columns);
 my $row_count = scalar(@$rows);
 my $col_index = 0;
 my $row_index = 0;
 my $retval = { # new datastructure
 row_count => $row_count,
 col_count => $col_count 
 };
 # iterate through all columns
 for($col_index=0; $col_index<$col_count; $col_index++) {
 # create a arrayref for all row-values of the current column
 # set it to the correct size and assign all values to this arrayref
 my $tmp = [];
 $#{$tmp} = $row_count-1; # set size of array to the number of rows
 # iterate through all rows
 for($row_index=0; $row_index<$row_count; $row_index++) { 
 # assign values to arrayref (which has the correct size) instead of a "slow" push
 $tmp->[$row_index] = $rows->[$row_index][$col_index]; 
 }
 # Assign the arrayref to the hash. The hash-key is the name of the column
 $retval->{$columns->[$col_index]} = $tmp;
 }
 return $retval;
}

My Question:

Is there a way to further optimize this function (maybe using $[...])? I found some hints here at page 18 and 19, but I don't have any experience in using $ in different contexts.

I have to say that the function listed above is the best I can do. There may be other ways to do some optimization which I have never heard of.

Question 2

Just a note: It seems ($self) you are using the subroutine as a method. You can remove the prototypes ($$) as they are ignored in method calls anyway.

Question 3

The following code is about 35% faster (measured with Benchmark). The tricks:

no anonymous array created for $tmp.
explicit return removed.
variables created in place where their value is needed.

Some of the tricks added just a 3%, the first one seemed the most important. YMMV.

I experimented with $_ and maps, too, but it seems the plain old C-style loop is the fastest.

sub faster {
 my ($self, $columns, $rows) = @_;
 my $retval = {
 row_count => my $row_count = @$rows,
 col_count => my $col_count = @$columns,
 };
 for (my $col_index = 0 ; $col_index < $col_count ; $col_index++) {
 my $tmp;
 for (my $row_index = 0 ; $row_index < $row_count ; $row_index++) {
 $tmp->[$row_index] = $rows->[$row_index][$col_index];
 }
 $retval->{$columns->[$col_index]} = $tmp;
 }
 $retval
}

Question 4

Thanks you very much. The first and second tricks are really interesting and from a c-style point of view very strange. i will try it with NYTProf asap.

Question 5

@some_coder: Forget the C-style point of view when optimizing Perl :-)

Question 6

In my benchmark (yay!) I've observed foreach loop was faster (and, what's even more important, more readable), so please change C-style fors to for my $col_index (0 .. $col_count - 1).

Question 7

@Xaerxess: In my Benchmark, switching to this style loop was slower.

Question 8

@choroba I guess it depends on how many iterations you're doing - see this gist, for-c outperforms foreach only up to 4-8 elements to iterate over. Still, in terms of readability, foreach wins, and that said I wish I'd never have to optimize fors.

choroba choroba 1,4139 silver badges11 bronze badges · Accepted Answer · 2014-03-11 12:35:19Z

3

\$\begingroup\$

The following code is about 35% faster (measured with Benchmark). The tricks:

no anonymous array created for $tmp.
explicit return removed.
variables created in place where their value is needed.

Some of the tricks added just a 3%, the first one seemed the most important. YMMV.

I experimented with $_ and maps, too, but it seems the plain old C-style loop is the fastest.

sub faster {
 my ($self, $columns, $rows) = @_;
 my $retval = {
 row_count => my $row_count = @$rows,
 col_count => my $col_count = @$columns,
 };
 for (my $col_index = 0 ; $col_index < $col_count ; $col_index++) {
 my $tmp;
 for (my $row_index = 0 ; $row_index < $row_count ; $row_index++) {
 $tmp->[$row_index] = $rows->[$row_index][$col_index];
 }
 $retval->{$columns->[$col_index]} = $tmp;
 }
 $retval
}

Share

answered Mar 11, 2014 at 12:35

choroba's user avatar

choroba choroba

1,4139 silver badges11 bronze badges

\$\endgroup\$

5

\$\begingroup\$ Thanks you very much. The first and second tricks are really interesting and from a c-style point of view very strange. i will try it with NYTProf asap. \$\endgroup\$

some_coder
– some_coder

2014年03月11日 12:45:19 +00:00
Commented Mar 11, 2014 at 12:45
\$\begingroup\$ @some_coder: Forget the C-style point of view when optimizing Perl :-) \$\endgroup\$

choroba
– choroba

2014年03月11日 12:47:21 +00:00
Commented Mar 11, 2014 at 12:47
\$\begingroup\$ In my benchmark (yay!) I've observed foreach loop was faster (and, what's even more important, more readable), so please change C-style fors to for my $col_index (0 .. $col_count - 1). \$\endgroup\$

Grzegorz Rożniecki
– Grzegorz Rożniecki

2014年03月11日 13:24:42 +00:00
Commented Mar 11, 2014 at 13:24
\$\begingroup\$ @Xaerxess: In my Benchmark, switching to this style loop was slower. \$\endgroup\$

choroba
– choroba

2014年03月11日 13:26:01 +00:00
Commented Mar 11, 2014 at 13:26
\$\begingroup\$ @choroba I guess it depends on how many iterations you're doing - see this gist, for-c outperforms foreach only up to 4-8 elements to iterate over. Still, in terms of readability, foreach wins, and that said I wish I'd never have to optimize fors. \$\endgroup\$

Grzegorz Rożniecki
– Grzegorz Rożniecki

2014年03月11日 13:55:47 +00:00
Commented Mar 11, 2014 at 13:55

Add a comment |

Stack Exchange Network

Performance optimization in function for datastructure mapping

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Performance optimization in function for datastructure mapping

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions