Hey so I have a function written in c
whose cache performance I have to improve. The statistics are being provided by cachegrind
. But I'm completely stuck and cannot get more than 10% improvement in the performance. I really need help with this.
Here is the function:
#define LARGER 50000
typedef struct {
char c1;
double f1;
int n1;
char c2;
int n2;
double f2;
} data;
char* func()
{
data* B = (data*) calloc(LARGER, sizeof(data));
if (!B) return 0;
double sum_f = 0.0;
double sum_n = 0;
char* sum_c = (char*) malloc(( 2 * LARGER + 1) * sizeof(char));
sum_c[2 * LARGER] = '0円';
int i;
for(i = 0; i < LARGER; i++)
{
sum_f += B[i].f1 + B[i].f2;
sum_n += B[i].n1 + B[i].n2;
sum_c[2 * i] = B[i].c1;
sum_c[2 * i + 1] = B[i].c2;
}
free(B);
return sum_c;
}
The first thing that I noticed is that the definition of the struct data
is not very cache-friendly because it has a ton of padding.
So, I changed the definition according to allignment requiirements to this -
typedef struct {
int n1;
int n2;
double f1;
double f2;
char c1;
char c2;
} data_new;
But this gives me only around 10% increase in cache-performance. I have no ideas on how to modify the for loop to improve the spatial locality further.
Can anyone guide me on how to be able to write better loops that are cache-friendly.
P.S. I am doing these questions as a part of my self study of a computer architecture book and I have no instructor to seek help.
-
2\$\begingroup\$ What does this code do? Please tell us. See How to Ask. \$\endgroup\$200_success– 200_success2017年04月19日 04:19:16 +00:00Commented Apr 19, 2017 at 4:19
1 Answer 1
when calling any of the heap allocation functions: (malloc, calloc, realloc)
- the returned type is void* so can be assigned to any pointer. Casting just clutters the code, making it more difficult to understand, debug, etc.
- the expression sizeof( char ) is defined in the standard as 1. Multiplying anything by 1 has absolutely no effect. Using that expression as a multiplying just clutters the code, making it more difficult to understand, debug, etc.
- always check (!=NULL) the returned value to assure the operation was successful.
regarding:
if (!B) return 0;
the function func() << horrible function name>> expects a returned value with type char *
but the code is returning a int
. Strongly suggest replacing that statement with:
if (!B) return NULL;
an item that will help with the performance is only performing certain calculations once, like:
sum_c[2 * i] = B[i].c1;
sum_c[2 * i + 1] = B[i].c2;
could be reduced to:
size_t offset = i<<1;
sum_c[offset++] = B[i].c1;
sum_c[offset] = B[i].c2;
and similar such complexity reductions
these two variables:
double sum_f = 0.0;
double sum_n = 0;
and these two statements
sum_f += B[i].f1 + B[i].f2;
sum_n += B[i].n1 + B[i].n2;
are not used, so can be completely eliminated.
it is a good program practice to separate the definition of a struct
from a typedef
on the struct name.
strongly suggest, when compiling, enabling all the warnings, then fix those warnings. (for gcc, at a minimum use: -Wall -Wextra -pedantic-errors
I also use: -Wconversion -std=gnu11
)
and now the first revised code
#include <stdlib.h> // malloc(), calloc(), free()
#define LARGER 50000
struct dataStruct
{
int n1;
int n2;
double f1;
double f2;
char c1;
char c2;
};
typedef struct dataStruct data_new;
char* func( void )
{
char* sum_c = malloc(( 2 * LARGER + 1) );
if( !sum_c )
return NULL;
sum_c[2 * LARGER] = '0円';
for(size_t i = 0; i < LARGER; i++)
{
size_t i2 = i<<1;
char * sumPtr = sum_c[i2];
*sumPtr++ = 0;
*sumPtr = 0;
}
return sum_c;
} // end function: func
Note: the allocated array B[] is initialized to zero and is never updated, so can eliminate that array and the associated calls to the heap allocation function.
#include <stdlib.h> // calloc()
#define LARGER 50000
#if 0
struct dataStruct
{
int n1;
int n2;
double f1;
double f2;
char c1;
char c2;
};
typedef struct dataStruct data_new;
#endif
char* func( void )
{
char* sum_c = calloc(( 2 * LARGER + 1) );
return sum_c; // NULL or otherwise, return the result
} // end function: func