Is there a way to transpose a matrix in ANSI C
way much better than this ANSI C
code?
Can I do transpose in a different and faster way?
The focus of this aim is: Speed
/*
* Turn A into transponse A^T
* A[row * column]
*/
void tran(float A[], size_t row, size_t column) {
/* Decleration */
float *B = (float*)malloc(row * column * sizeof(float));
float* transpose = NULL;
float *ptr_A = A;
size_t i, j;
for (i = 0; i < row; i++) {
transpose = &B[i];
for (j = 0; j < column; j++) {
*transpose = *ptr_A;
ptr_A++;
transpose += row;
}
}
/* Copy! */
memcpy(A, B, row*column*sizeof(float));
/* Free */
free(B);
}
1 Answer 1
Performance
To get the best performance on contemporary CPUs you need a very good understanding of the memory and cache systems, the prefetcher, and if vector instructions can be used. The best algorithm to use will also depend on the size of the matrix.
There are also some performance issues caused by the language itself, and I recommend you target at least C99 so you can use restrict
.
You are also creating a temporary matrix and copying it back at the end. This has some overhead that could be avoided in two ways:
- Don't modify
A[]
at all, but let the caller supply aB[]
where the result will be stored. - Modify
A[]
in-place. This is rather easy for square matrices, but it will quickly become complicated for non-square matrices. Still, you could do the in-place method for square ones and fall back to your original code for non-square ones.
Naming things
The names of the function and the variable names could be improved. Most importantly, I would make sure the function is named transpose()
. Unnecessary abbreviations can cause confusion later on. The variable transpose
must then be renamed of course, at first glance a better name would just be ptr_B
.
row
and column
are singular, so it sounds like those are row and column indices, not counts. rows
and columns
would be better, or if you want to be even more explicit, num_rows
and num_columns
. Once you have done that, you can rename i
and j
to row
and column
.
Simplified code
The code can be greatly simplified by using C99 and using []
instead of pointer arithmetic:
void transpose(const float *restrict input, float *restrict output,
size_t num_rows, size_t num_columns)
{
for (size_t row = 0; row < num_rows; row++)
for (size_t column = 0; column < num_columns; column++)
output[column * num_rows + row] = input[row * num_columns + column];
}
An optimizing compiler will generate similar assembly code for the loop.
-
\$\begingroup\$ My code is going to be used with C++, so I cannot use C99. Sorry, but thanks for the advice about restrict. \$\endgroup\$euraad– euraad2023年12月28日 13:09:19 +00:00Commented Dec 28, 2023 at 13:09
-
\$\begingroup\$ You can most likely use
restrict
, or at least some version of it, in C++ as well, see this question. You can always define a macro for it that expands torestrict
,__restrict__
or nothing depending on what the compiler supports. \$\endgroup\$G. Sliepen– G. Sliepen2023年12月28日 13:21:23 +00:00Commented Dec 28, 2023 at 13:21 -
\$\begingroup\$ Will it be faster if you did
input += num_columns
instead? Then you don't need the multiplication*
statement. \$\endgroup\$euraad– euraad2023年12月28日 13:40:53 +00:00Commented Dec 28, 2023 at 13:40 -
\$\begingroup\$ No, the compiler will already do that for you automatically; it's called strength reduction, and is a very basic optimization. I recommend you watch this video to get an idea of what the compiler does for you. \$\endgroup\$G. Sliepen– G. Sliepen2023年12月28日 18:46:10 +00:00Commented Dec 28, 2023 at 18:46
-
1\$\begingroup\$ I understand. Thank you for your advices. Early optimization is the root of evil. \$\endgroup\$euraad– euraad2023年12月28日 19:09:40 +00:00Commented Dec 28, 2023 at 19:09