Matrix transpose function

Question 1

I have a matrix transpose function that takes arrays with a possibly different number of rows and columns.

How can I improve performance and code quality?

void matrix_transpose(float *matrix, int rows, int columns)
{
 int current_row;
 int current_column;
 float temp;
 float buffer[MATRIX_TRANSPOSE_BUFFER_SIZE];
 int buffer_position = 0;
 if(rows == columns)
 {
 for(current_row = 0; current_row < rows; ++current_row)
 {
 for(current_column = current_row; current_column < columns; ++current_column)
 {
 temp = *(matrix + current_row * columns + current_column);
 *(matrix + current_row * columns + current_column) = *(matrix + current_column * columns + current_row);
 *(matrix + current_column * columns + current_row) = temp;
 }
 }
 }
 else if(rows * columns < MATRIX_TRANSPOSE_BUFFER_SIZE)
 {
 for(current_column = 0; current_column < columns; ++current_column)
 {
 for(current_row = 0; current_row < rows; ++current_row)
 {
 buffer[buffer_position++] = *(matrix + current_row * columns + current_column);
 }
 }
 buffer_position = 0;
 while(buffer_position < rows * columns)
 {
 *matrix++ = buffer[buffer_position++];
 }
 }
}

Here's an example call:

float matrix[] = {
 1, 2, 3, 4,
 5, 6, 7, 8,
 9, 10, 11, 12,
 13, 14, 15, 16,
 17, 18, 19, 20
};
matrix_transpose(matrix, 5, 4);

Question 2

Try a XOR swap.

Question 3

Also, for copying memory, use memcpy() as it's usually better optimized than your loop and will make your code more readable.

Question 4

I was exploring the problem, and discovered that in-place matrix tranposition has been extensively studied.

Question 5

The call can apparently fail if the matrix supplied is larger than the internal buffer in the function. Yet there is no indication that this was the case. matrix_transpose should return an error code indicating the result of the operation.
float buffer[MATRIX_TRANSPOSE_BUFFER_SIZE]; lives on the stack of the function. The stack is usually limited to a few MB (I think default in Visual Studio is 1MB, on Linux it's typically around 8MB but depends on the setup). Not sure how big the matrices will get but this might get you into trouble. Consider using either a static buffer (not thread safe) or allocate the buffer via malloc on the heap.
Instead of *(matrix + current_row * columns + current_column) you can also write matrix[current_row * columns + current_column] which is a bit easier on the eyes (imho).
It is accepted practice that loop counters can be single letter variables. In conjunction with the fact that you operate on a 2 dimensional array you could consider renaming current_row into y and current_column into x. Shortens the code somewhat and doesn't loose much meaning. Using x and y to identify positions in a 2D structure is common practice.
When you transpose a matrix which is not square then the calling code will not be directly exposed to the fact that rows and columns are now transposed. You should pass rows and columns "by reference" (int *) and swap them before you return from the function. Otherwise the calling could needs to do that every time it calls the transpose function and the programmer is bound to forget it at one point.

Question 6

Won't calling malloc every time the function runs make it slower?

Question 7

I tested calling malloc and free, the total cost for the function went up from just over 640 to over 4000 on the first call and to over 1000 on subsequent calls. Would malloc definitely be considered a better approach?

Question 8

Well, if your matrices are always small then keep it on the stack but if you want to make them bigger in the future then you simply can't keep them on the stack. Sure, malloc and free add some overhead.

Question 9

There's a term we use to describe getting fast results with code that's not quite correct: it's called cheating on a benchmark. Either you fix the bug to handle the general case, or officially recognize the limitation (with a comment on the function and an error code).

ChrisWue ChrisWue 20.6k4 gold badges42 silver badges107 bronze badges · Accepted Answer · 2013-10-20 07:53:58Z

The call can apparently fail if the matrix supplied is larger than the internal buffer in the function. Yet there is no indication that this was the case. matrix_transpose should return an error code indicating the result of the operation.
float buffer[MATRIX_TRANSPOSE_BUFFER_SIZE]; lives on the stack of the function. The stack is usually limited to a few MB (I think default in Visual Studio is 1MB, on Linux it's typically around 8MB but depends on the setup). Not sure how big the matrices will get but this might get you into trouble. Consider using either a static buffer (not thread safe) or allocate the buffer via malloc on the heap.
Instead of *(matrix + current_row * columns + current_column) you can also write matrix[current_row * columns + current_column] which is a bit easier on the eyes (imho).
It is accepted practice that loop counters can be single letter variables. In conjunction with the fact that you operate on a 2 dimensional array you could consider renaming current_row into y and current_column into x. Shortens the code somewhat and doesn't loose much meaning. Using x and y to identify positions in a 2D structure is common practice.
When you transpose a matrix which is not square then the calling code will not be directly exposed to the fact that rows and columns are now transposed. You should pass rows and columns "by reference" (int *) and swap them before you return from the function. Otherwise the calling could needs to do that every time it calls the transpose function and the programmer is bound to forget it at one point.

Won't calling malloc every time the function runs make it slower?
I tested calling malloc and free, the total cost for the function went up from just over 640 to over 4000 on the first call and to over 1000 on subsequent calls. Would malloc definitely be considered a better approach?
Well, if your matrices are always small then keep it on the stack but if you want to make them bigger in the future then you simply can't keep them on the stack. Sure, malloc and free add some overhead.
There's a term we use to describe getting fast results with code that's not quite correct: it's called cheating on a benchmark. Either you fix the bug to handle the general case, or officially recognize the limitation (with a comment on the function and an error code).

Stack Exchange Network

Matrix transpose function

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Matrix transpose function

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions