Sparse matrix multiplication

Question 1

I am working on a sparse matrix application in C and I choose compressed sparse row (CSC) and compressed sparse column (CSC) as my data structure for it. But the difficult part is I cannot improve my matrix multiplication function.

/**
 * The element and matrix are the naive format I have for sparse matrix,
 * They are just for storing information got from after reading the file(s).
 */
typedef struct{
 int row;
 int column;
 int id;
 int value;
} element;
typedef struct{
 int row;
 int column;
 int numberofElements;
 element* elements;
} matrix;
/**
 * CSR and CSC are compressed sparse row/column format to store sparse matrix
 * The detailed definition is here:http://en.wikipedia.org/wiki/Sparse_matrix
 */
typedef struct{
 int row;
 int column;
 int length;//The number of non-zero element in a matrix.
 int* val;
 int* col_ind;
 int* row_ptr;
} CSR;
typedef struct{
 int row;
 int column;
 int length;
 int* val;
 int* row_ind;
 int* col_ptr;
} CSC;
typedef struct{
 int val;
 int index;
} tuple;
int compute(tuple* row, tuple* column, int row_length, int col_length){
 int result = 0;
 for(int i = 0; i < row_length; i++){
 for(int j = 0; j < col_length; j++){
 if(row[i].index == column[j].index){
 result += row[i].val * column[j].val;
 }
 }
 }
 return result;
}
void multiply(matrix X, matrix Y, matrix* result){
 CSR cX;
 CSC cY; 
 int product = 0;
 tuple **row = NULL, **column = NULL;
 int *row_length,*col_length;
 matrixtoCSR(&cX, X);
 matrixtoCSC(&cY, Y);
 row = (tuple **) malloc(X.row * sizeof(tuple *));
 row_length = (int *) malloc (X.row * sizeof(int));
 column = (tuple **) malloc(Y.column *sizeof(tuple *));
 col_length = (int *) malloc (Y.column * sizeof (int ));
 for(int r = 0; r < X.row; r++){
 row_length[r] = (r == cX.row-1) ? cX.length - cX.row_ptr[r]:cX.row_ptr[r+1] - cX.row_ptr[r];
 row[r] = !row_length[r] ? NULL : (tuple* ) malloc (row_length[r] * sizeof(tuple));
 if(row[r] != NULL){
 for(int i = 0; i < row_length[r]; i++){
 row[r][i].val = cX.val[cX.row_ptr[r]+i];
 row[r][i].index = cX.col_ind[cX.row_ptr[r]+i];
 }
 }
 }
 for(int c = 0; c < Y.column; c++){
 col_length[c] = (c == cY.column-1) ? cY.length- cY.col_ptr[c]:cY.col_ptr[c+1] - cY.col_ptr[c];
 column[c] = !col_length[c] ? NULL : (tuple *) malloc (col_length[c] * sizeof(tuple));
 if(column != NULL){
 for(int i = 0; i < col_length[c]; i++){
 column[c][i].val = cY.val[cY.col_ptr[c]+i];
 column[c][i].index = cY.row_ind[cY.col_ptr[c]+i];
 }
 }
 }
 for(int r = 0; r < X.row; r++){
 for(int c = 0; c < Y.column; c++){
 product = 0;
 if(row[r] != NULL && column[c] != NULL)
 product=compute(row[r], column[c], row_length[r], col_length[c]);
 if(product){
 result->elements[result->numberofElements].row = r;
 result->elements[result->numberofElements].column = c;
 result->elements[result->numberofElements].id = r * result->column + c;
 result->elements[result->numberofElements].value = product;
 result->numberofElements++;
 }
 }
 }
 result->elements = (element *)realloc( result->elements , result->numberofElements * sizeof(element));
 for(int i = 0 ; i < X.row; i++){
 if(row[i] != NULL)
 free(row[i]);
 }
 free(row);
 for(int i = 0 ; i < Y.column; i++){
 if(column[i] != NULL)
 free(column[i]);
 }
 free(column);
 free(row_length);
 free(col_length);
 cleanCSR(&cX);
 cleanCSC(&cY);
}

I used gprof to prof it and it shows that if I multiply 4 1000*1000 matrix the program will take about 7s to compute the output and 98.7% of time it is running this function.

How can I improve it? I have tried a lot of things like trade-off some space but the time just doesn't go down. I think I am kind of stuck.

Question 2

Well you could make sure the indentation is consistent so people can read it.

Question 3

please post the struct and type definitions too - I wont review it unless I can compile it.

Question 4

Hi, I have updated it. would u please have a look at it now? Thanks.

Question 5

Your tuple type and matrixtoCSR and matrixtoCSC are still missing. Also please post a main() with a simple test case. BTW why do you use a CSC and a CSR?

Question 6

@WilliamMorris Hi I have updated it with the def of tuple but sorry I cannot provide a test case because the main method reads some files of certain form and then call the multiply function. It will be too many lines to paste here. If you want I can send a email to you withe all the source files and test files. The reason why I use CSC and CSR is that I read the sparse matrix entry in wikipedia and I think it is a good idea to store the matrix in this form since it is fast for row/column slicing.

Question 7

Here is my understanding of what you are doing:

you have some matrices stored in files using your own 'naive' format.
you read these into your matrix structures
two of these matrices are then passed to multiply()
you convert one of the matrices into a Compressed Row Storage
and the other into a Compressed Column Storage
you then extract the matrices from these compressed forms into arrays of values, one for rows and one for columns
and finally you multiply them.

I find that complicated!

Why not store the matrices directly in compressed format?
Or convert your naive format into the extracted form without the CSC/CSR stage?
Why use both CSR and CSC? They are both methods of compressing a matrix but they do the same job. I can see no point in using both.

With these defects, there is not much point in reviewing the code in detail, but here are some general points:

include the right headers and don't cast the return from malloc
know that calloc gives you a zeroed array
use const where possible on function parameters (eg row/column in compute, X/Y in multiply)
use static where possible for functions
compute and multiply are badly named
improve your naming in general. You have four structures all containing fields named row and column. I'm sure at least two of these contain the number of rows/columns (eg. n_rows). And your tuple lists are called row and column. It all becomes very confusing.
pass by reference is better for structures.
your multiply does a lot more than just multiply the matrices. It first compresses the input matrices and then expands the compressed matrices. These steps belong elsewhere (if they belong anywhere). Your first two big for-loops shoud be separate functions. The nested-for in each might well belong in a separate function too (difficult to tell, but nested loops are often better split). And the free-loops should also be separate functions. By splitting a big function into several smaller ones (with appropriate names), you make the code easier to read and test.

Sorry to be negative.

Question 8

Thanks. I will try to follow your advice and modify my code. Thanks indeed.

William Morris William Morris 9,40419 silver badges43 bronze badges · Answer 1 · 2013-01-06 20:02:57Z

Here is my understanding of what you are doing:

you have some matrices stored in files using your own 'naive' format.
you read these into your matrix structures
two of these matrices are then passed to multiply()
you convert one of the matrices into a Compressed Row Storage
and the other into a Compressed Column Storage
you then extract the matrices from these compressed forms into arrays of values, one for rows and one for columns
and finally you multiply them.

I find that complicated!

Why not store the matrices directly in compressed format?
Or convert your naive format into the extracted form without the CSC/CSR stage?
Why use both CSR and CSC? They are both methods of compressing a matrix but they do the same job. I can see no point in using both.

With these defects, there is not much point in reviewing the code in detail, but here are some general points:

include the right headers and don't cast the return from malloc
know that calloc gives you a zeroed array
use const where possible on function parameters (eg row/column in compute, X/Y in multiply)
use static where possible for functions
compute and multiply are badly named
improve your naming in general. You have four structures all containing fields named row and column. I'm sure at least two of these contain the number of rows/columns (eg. n_rows). And your tuple lists are called row and column. It all becomes very confusing.
pass by reference is better for structures.
your multiply does a lot more than just multiply the matrices. It first compresses the input matrices and then expands the compressed matrices. These steps belong elsewhere (if they belong anywhere). Your first two big for-loops shoud be separate functions. The nested-for in each might well belong in a separate function too (difficult to tell, but nested loops are often better split). And the free-loops should also be separate functions. By splitting a big function into several smaller ones (with appropriate names), you make the code easier to read and test.

Sorry to be negative.

Thanks. I will try to follow your advice and modify my code. Thanks indeed.

Stack Exchange Network

Sparse matrix multiplication

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Sparse matrix multiplication

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions