Improving cache hit rate

Question 1

Hey so I have a function written in c whose cache performance I have to improve. The statistics are being provided by cachegrind. But I'm completely stuck and cannot get more than 10% improvement in the performance. I really need help with this.

Here is the function:

#define LARGER 50000
typedef struct { 
 char c1;
 double f1;
 int n1; 
 char c2;
 int n2;
 double f2; 
} data; 
char* func()
{
 data* B = (data*) calloc(LARGER, sizeof(data));
 if (!B) return 0;
 double sum_f = 0.0;
 double sum_n = 0;
 char* sum_c = (char*) malloc(( 2 * LARGER + 1) * sizeof(char));
 sum_c[2 * LARGER] = '0円';
 int i;
 for(i = 0; i < LARGER; i++)
 {
 sum_f += B[i].f1 + B[i].f2;
 sum_n += B[i].n1 + B[i].n2;
 sum_c[2 * i] = B[i].c1;
 sum_c[2 * i + 1] = B[i].c2;
 }
 free(B);
 return sum_c;
}

The first thing that I noticed is that the definition of the struct data is not very cache-friendly because it has a ton of padding.

So, I changed the definition according to allignment requiirements to this -

typedef struct {
 int n1;
 int n2;
 double f1;
 double f2;
 char c1;
 char c2;
} data_new;

But this gives me only around 10% increase in cache-performance. I have no ideas on how to modify the for loop to improve the spatial locality further.

Can anyone guide me on how to be able to write better loops that are cache-friendly.

P.S. I am doing these questions as a part of my self study of a computer architecture book and I have no instructor to seek help.

Question 2

What does this code do? Please tell us. See How to Ask.

Question 3

when calling any of the heap allocation functions: (malloc, calloc, realloc)

the returned type is void* so can be assigned to any pointer. Casting just clutters the code, making it more difficult to understand, debug, etc.
the expression sizeof( char ) is defined in the standard as 1. Multiplying anything by 1 has absolutely no effect. Using that expression as a multiplying just clutters the code, making it more difficult to understand, debug, etc.
always check (!=NULL) the returned value to assure the operation was successful.

regarding:

if (!B) return 0;

the function func() << horrible function name>> expects a returned value with type char * but the code is returning a int. Strongly suggest replacing that statement with:

if (!B) return NULL;

an item that will help with the performance is only performing certain calculations once, like:

sum_c[2 * i] = B[i].c1; 
sum_c[2 * i + 1] = B[i].c2;

could be reduced to:

size_t offset = i<<1; 
sum_c[offset++] = B[i].c1; 
sum_c[offset] = B[i].c2;

and similar such complexity reductions

these two variables:

double sum_f = 0.0; 
double sum_n = 0;

and these two statements

sum_f += B[i].f1 + B[i].f2; 
sum_n += B[i].n1 + B[i].n2;

are not used, so can be completely eliminated.

it is a good program practice to separate the definition of a struct from a typedef on the struct name.

strongly suggest, when compiling, enabling all the warnings, then fix those warnings. (for gcc, at a minimum use: -Wall -Wextra -pedantic-errors I also use: -Wconversion -std=gnu11 )

and now the first revised code

#include <stdlib.h> // malloc(), calloc(), free()
#define LARGER 50000
struct dataStruct
{
 int n1;
 int n2;
 double f1;
 double f2;
 char c1;
 char c2;
};
typedef struct dataStruct data_new;
char* func( void )
{
 char* sum_c = malloc(( 2 * LARGER + 1) );
 if( !sum_c )
 return NULL;
 sum_c[2 * LARGER] = '0円';
 for(size_t i = 0; i < LARGER; i++)
 {
 size_t i2 = i<<1;
 char * sumPtr = sum_c[i2];
 *sumPtr++ = 0;
 *sumPtr = 0;
 }
 return sum_c;
} // end function: func

Note: the allocated array B[] is initialized to zero and is never updated, so can eliminate that array and the associated calls to the heap allocation function.

#include <stdlib.h> // calloc()
#define LARGER 50000
#if 0
struct dataStruct
{
 int n1;
 int n2;
 double f1;
 double f2;
 char c1;
 char c2;
};
typedef struct dataStruct data_new;
#endif
char* func( void )
{
 char* sum_c = calloc(( 2 * LARGER + 1) );
 return sum_c; // NULL or otherwise, return the result
} // end function: func

user3629249 user3629249 2,90811 silver badges10 bronze badges · Answer 1 · 2017-04-19 04:45:55Z

when calling any of the heap allocation functions: (malloc, calloc, realloc)

the returned type is void* so can be assigned to any pointer. Casting just clutters the code, making it more difficult to understand, debug, etc.
the expression sizeof( char ) is defined in the standard as 1. Multiplying anything by 1 has absolutely no effect. Using that expression as a multiplying just clutters the code, making it more difficult to understand, debug, etc.
always check (!=NULL) the returned value to assure the operation was successful.

regarding:

if (!B) return 0;

the function func() << horrible function name>> expects a returned value with type char * but the code is returning a int. Strongly suggest replacing that statement with:

if (!B) return NULL;

an item that will help with the performance is only performing certain calculations once, like:

sum_c[2 * i] = B[i].c1; 
sum_c[2 * i + 1] = B[i].c2;

could be reduced to:

size_t offset = i<<1; 
sum_c[offset++] = B[i].c1; 
sum_c[offset] = B[i].c2;

and similar such complexity reductions

these two variables:

double sum_f = 0.0; 
double sum_n = 0;

and these two statements

sum_f += B[i].f1 + B[i].f2; 
sum_n += B[i].n1 + B[i].n2;

are not used, so can be completely eliminated.

it is a good program practice to separate the definition of a struct from a typedef on the struct name.

strongly suggest, when compiling, enabling all the warnings, then fix those warnings. (for gcc, at a minimum use: -Wall -Wextra -pedantic-errors I also use: -Wconversion -std=gnu11 )

and now the first revised code

#include <stdlib.h> // malloc(), calloc(), free()
#define LARGER 50000
struct dataStruct
{
 int n1;
 int n2;
 double f1;
 double f2;
 char c1;
 char c2;
};
typedef struct dataStruct data_new;
char* func( void )
{
 char* sum_c = malloc(( 2 * LARGER + 1) );
 if( !sum_c )
 return NULL;
 sum_c[2 * LARGER] = '0円';
 for(size_t i = 0; i < LARGER; i++)
 {
 size_t i2 = i<<1;
 char * sumPtr = sum_c[i2];
 *sumPtr++ = 0;
 *sumPtr = 0;
 }
 return sum_c;
} // end function: func

Note: the allocated array B[] is initialized to zero and is never updated, so can eliminate that array and the associated calls to the heap allocation function.

#include <stdlib.h> // calloc()
#define LARGER 50000
#if 0
struct dataStruct
{
 int n1;
 int n2;
 double f1;
 double f2;
 char c1;
 char c2;
};
typedef struct dataStruct data_new;
#endif
char* func( void )
{
 char* sum_c = calloc(( 2 * LARGER + 1) );
 return sum_c; // NULL or otherwise, return the result
} // end function: func

Stack Exchange Network

Improving cache hit rate

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Improving cache hit rate

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions