\$\begingroup\$
\$\endgroup\$
3
I have a matrix transpose function that takes arrays with a possibly different number of rows and columns.
How can I improve performance and code quality?
void matrix_transpose(float *matrix, int rows, int columns)
{
int current_row;
int current_column;
float temp;
float buffer[MATRIX_TRANSPOSE_BUFFER_SIZE];
int buffer_position = 0;
if(rows == columns)
{
for(current_row = 0; current_row < rows; ++current_row)
{
for(current_column = current_row; current_column < columns; ++current_column)
{
temp = *(matrix + current_row * columns + current_column);
*(matrix + current_row * columns + current_column) = *(matrix + current_column * columns + current_row);
*(matrix + current_column * columns + current_row) = temp;
}
}
}
else if(rows * columns < MATRIX_TRANSPOSE_BUFFER_SIZE)
{
for(current_column = 0; current_column < columns; ++current_column)
{
for(current_row = 0; current_row < rows; ++current_row)
{
buffer[buffer_position++] = *(matrix + current_row * columns + current_column);
}
}
buffer_position = 0;
while(buffer_position < rows * columns)
{
*matrix++ = buffer[buffer_position++];
}
}
}
Here's an example call:
float matrix[] = {
1, 2, 3, 4,
5, 6, 7, 8,
9, 10, 11, 12,
13, 14, 15, 16,
17, 18, 19, 20
};
matrix_transpose(matrix, 5, 4);
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Oct 20, 2013 at 6:06
1 Answer 1
\$\begingroup\$
\$\endgroup\$
4
- The call can apparently fail if the matrix supplied is larger than the internal buffer in the function. Yet there is no indication that this was the case.
matrix_transpose
should return an error code indicating the result of the operation. float buffer[MATRIX_TRANSPOSE_BUFFER_SIZE];
lives on the stack of the function. The stack is usually limited to a few MB (I think default in Visual Studio is 1MB, on Linux it's typically around 8MB but depends on the setup). Not sure how big the matrices will get but this might get you into trouble. Consider using either a static buffer (not thread safe) or allocate the buffer viamalloc
on the heap.- Instead of
*(matrix + current_row * columns + current_column)
you can also writematrix[current_row * columns + current_column]
which is a bit easier on the eyes (imho). - It is accepted practice that loop counters can be single letter variables. In conjunction with the fact that you operate on a 2 dimensional array you could consider renaming
current_row
intoy
andcurrent_column
intox
. Shortens the code somewhat and doesn't loose much meaning. Usingx
andy
to identify positions in a 2D structure is common practice. - When you transpose a matrix which is not square then the calling code will not be directly exposed to the fact that
rows
andcolumns
are now transposed. You should passrows
andcolumns
"by reference" (int *
) and swap them before you return from the function. Otherwise the calling could needs to do that every time it calls the transpose function and the programmer is bound to forget it at one point.
answered Oct 20, 2013 at 7:53
-
\$\begingroup\$ Won't calling malloc every time the function runs make it slower? \$\endgroup\$2013Asker– 2013Asker2013年10月20日 09:03:17 +00:00Commented Oct 20, 2013 at 9:03
-
\$\begingroup\$ I tested calling malloc and free, the total cost for the function went up from just over 640 to over 4000 on the first call and to over 1000 on subsequent calls. Would malloc definitely be considered a better approach? \$\endgroup\$2013Asker– 2013Asker2013年10月20日 09:31:08 +00:00Commented Oct 20, 2013 at 9:31
-
1\$\begingroup\$ Well, if your matrices are always small then keep it on the stack but if you want to make them bigger in the future then you simply can't keep them on the stack. Sure, malloc and free add some overhead. \$\endgroup\$ChrisWue– ChrisWue2013年10月20日 10:54:06 +00:00Commented Oct 20, 2013 at 10:54
-
\$\begingroup\$ There's a term we use to describe getting fast results with code that's not quite correct: it's called cheating on a benchmark. Either you fix the bug to handle the general case, or officially recognize the limitation (with a comment on the function and an error code). \$\endgroup\$200_success– 200_success2013年10月20日 20:27:58 +00:00Commented Oct 20, 2013 at 20:27
lang-c
memcpy()
as it's usually better optimized than your loop and will make your code more readable. \$\endgroup\$