Transpose matrix in ANSI C

Question 1

Is there a way to transpose a matrix in ANSI C way much better than this ANSI C code? Can I do transpose in a different and faster way?

The focus of this aim is: Speed

/*
 * Turn A into transponse A^T
 * A[row * column]
 */
void tran(float A[], size_t row, size_t column) {
 /* Decleration */
 float *B = (float*)malloc(row * column * sizeof(float));
 float* transpose = NULL;
 float *ptr_A = A;
 size_t i, j;
 for (i = 0; i < row; i++) {
 transpose = &B[i];
 for (j = 0; j < column; j++) {
 *transpose = *ptr_A;
 ptr_A++;
 transpose += row;
 }
 }
 /* Copy! */
 memcpy(A, B, row*column*sizeof(float));
 /* Free */
 free(B);
}

Question 2

Performance

To get the best performance on contemporary CPUs you need a very good understanding of the memory and cache systems, the prefetcher, and if vector instructions can be used. The best algorithm to use will also depend on the size of the matrix.

There are also some performance issues caused by the language itself, and I recommend you target at least C99 so you can use restrict.

You are also creating a temporary matrix and copying it back at the end. This has some overhead that could be avoided in two ways:

Don't modify A[] at all, but let the caller supply a B[] where the result will be stored.
Modify A[] in-place. This is rather easy for square matrices, but it will quickly become complicated for non-square matrices. Still, you could do the in-place method for square ones and fall back to your original code for non-square ones.

Naming things

The names of the function and the variable names could be improved. Most importantly, I would make sure the function is named transpose(). Unnecessary abbreviations can cause confusion later on. The variable transpose must then be renamed of course, at first glance a better name would just be ptr_B.

row and column are singular, so it sounds like those are row and column indices, not counts. rows and columns would be better, or if you want to be even more explicit, num_rows and num_columns. Once you have done that, you can rename i and j to row and column.

Simplified code

The code can be greatly simplified by using C99 and using [] instead of pointer arithmetic:

void transpose(const float *restrict input, float *restrict output,
 size_t num_rows, size_t num_columns)
{
 for (size_t row = 0; row < num_rows; row++)
 for (size_t column = 0; column < num_columns; column++)
 output[column * num_rows + row] = input[row * num_columns + column];
}

An optimizing compiler will generate similar assembly code for the loop.

Question 3

My code is going to be used with C++, so I cannot use C99. Sorry, but thanks for the advice about restrict.

Question 4

You can most likely use restrict, or at least some version of it, in C++ as well, see this question. You can always define a macro for it that expands to restrict, __restrict__ or nothing depending on what the compiler supports.

Question 5

Will it be faster if you did input += num_columns instead? Then you don't need the multiplication * statement.

Question 6

No, the compiler will already do that for you automatically; it's called strength reduction, and is a very basic optimization. I recommend you watch this video to get an idea of what the compiler does for you.

Question 7

I understand. Thank you for your advices. Early optimization is the root of evil.

G. Sliepen G. Sliepen 68.9k3 gold badges74 silver badges180 bronze badges · Accepted Answer · 2023-12-28 12:14:40Z

Performance

To get the best performance on contemporary CPUs you need a very good understanding of the memory and cache systems, the prefetcher, and if vector instructions can be used. The best algorithm to use will also depend on the size of the matrix.

There are also some performance issues caused by the language itself, and I recommend you target at least C99 so you can use restrict.

You are also creating a temporary matrix and copying it back at the end. This has some overhead that could be avoided in two ways:

Don't modify A[] at all, but let the caller supply a B[] where the result will be stored.
Modify A[] in-place. This is rather easy for square matrices, but it will quickly become complicated for non-square matrices. Still, you could do the in-place method for square ones and fall back to your original code for non-square ones.

Naming things

The names of the function and the variable names could be improved. Most importantly, I would make sure the function is named transpose(). Unnecessary abbreviations can cause confusion later on. The variable transpose must then be renamed of course, at first glance a better name would just be ptr_B.

row and column are singular, so it sounds like those are row and column indices, not counts. rows and columns would be better, or if you want to be even more explicit, num_rows and num_columns. Once you have done that, you can rename i and j to row and column.

Simplified code

The code can be greatly simplified by using C99 and using [] instead of pointer arithmetic:

void transpose(const float *restrict input, float *restrict output,
 size_t num_rows, size_t num_columns)
{
 for (size_t row = 0; row < num_rows; row++)
 for (size_t column = 0; column < num_columns; column++)
 output[column * num_rows + row] = input[row * num_columns + column];
}

An optimizing compiler will generate similar assembly code for the loop.

My code is going to be used with C++, so I cannot use C99. Sorry, but thanks for the advice about restrict.
You can most likely use restrict, or at least some version of it, in C++ as well, see this question. You can always define a macro for it that expands to restrict, __restrict__ or nothing depending on what the compiler supports.
Will it be faster if you did input += num_columns instead? Then you don't need the multiplication * statement.
No, the compiler will already do that for you automatically; it's called strength reduction, and is a very basic optimization. I recommend you watch this video to get an idea of what the compiler does for you.
I understand. Thank you for your advices. Early optimization is the root of evil.

Stack Exchange Network

Transpose matrix in ANSI C

1 Answer 1

Performance

Naming things

Simplified code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Transpose matrix in ANSI C

1 Answer 1

Performance

Naming things

Simplified code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions