I am working on a sparse matrix application in C and I choose compressed sparse row (CSC) and compressed sparse column (CSC) as my data structure for it. But the difficult part is I cannot improve my matrix multiplication function.
/**
* The element and matrix are the naive format I have for sparse matrix,
* They are just for storing information got from after reading the file(s).
*/
typedef struct{
int row;
int column;
int id;
int value;
} element;
typedef struct{
int row;
int column;
int numberofElements;
element* elements;
} matrix;
/**
* CSR and CSC are compressed sparse row/column format to store sparse matrix
* The detailed definition is here:http://en.wikipedia.org/wiki/Sparse_matrix
*/
typedef struct{
int row;
int column;
int length;//The number of non-zero element in a matrix.
int* val;
int* col_ind;
int* row_ptr;
} CSR;
typedef struct{
int row;
int column;
int length;
int* val;
int* row_ind;
int* col_ptr;
} CSC;
typedef struct{
int val;
int index;
} tuple;
int compute(tuple* row, tuple* column, int row_length, int col_length){
int result = 0;
for(int i = 0; i < row_length; i++){
for(int j = 0; j < col_length; j++){
if(row[i].index == column[j].index){
result += row[i].val * column[j].val;
}
}
}
return result;
}
void multiply(matrix X, matrix Y, matrix* result){
CSR cX;
CSC cY;
int product = 0;
tuple **row = NULL, **column = NULL;
int *row_length,*col_length;
matrixtoCSR(&cX, X);
matrixtoCSC(&cY, Y);
row = (tuple **) malloc(X.row * sizeof(tuple *));
row_length = (int *) malloc (X.row * sizeof(int));
column = (tuple **) malloc(Y.column *sizeof(tuple *));
col_length = (int *) malloc (Y.column * sizeof (int ));
for(int r = 0; r < X.row; r++){
row_length[r] = (r == cX.row-1) ? cX.length - cX.row_ptr[r]:cX.row_ptr[r+1] - cX.row_ptr[r];
row[r] = !row_length[r] ? NULL : (tuple* ) malloc (row_length[r] * sizeof(tuple));
if(row[r] != NULL){
for(int i = 0; i < row_length[r]; i++){
row[r][i].val = cX.val[cX.row_ptr[r]+i];
row[r][i].index = cX.col_ind[cX.row_ptr[r]+i];
}
}
}
for(int c = 0; c < Y.column; c++){
col_length[c] = (c == cY.column-1) ? cY.length- cY.col_ptr[c]:cY.col_ptr[c+1] - cY.col_ptr[c];
column[c] = !col_length[c] ? NULL : (tuple *) malloc (col_length[c] * sizeof(tuple));
if(column != NULL){
for(int i = 0; i < col_length[c]; i++){
column[c][i].val = cY.val[cY.col_ptr[c]+i];
column[c][i].index = cY.row_ind[cY.col_ptr[c]+i];
}
}
}
for(int r = 0; r < X.row; r++){
for(int c = 0; c < Y.column; c++){
product = 0;
if(row[r] != NULL && column[c] != NULL)
product=compute(row[r], column[c], row_length[r], col_length[c]);
if(product){
result->elements[result->numberofElements].row = r;
result->elements[result->numberofElements].column = c;
result->elements[result->numberofElements].id = r * result->column + c;
result->elements[result->numberofElements].value = product;
result->numberofElements++;
}
}
}
result->elements = (element *)realloc( result->elements , result->numberofElements * sizeof(element));
for(int i = 0 ; i < X.row; i++){
if(row[i] != NULL)
free(row[i]);
}
free(row);
for(int i = 0 ; i < Y.column; i++){
if(column[i] != NULL)
free(column[i]);
}
free(column);
free(row_length);
free(col_length);
cleanCSR(&cX);
cleanCSC(&cY);
}
I used gprof to prof it and it shows that if I multiply 4 1000*1000 matrix the program will take about 7s to compute the output and 98.7% of time it is running this function.
How can I improve it? I have tried a lot of things like trade-off some space but the time just doesn't go down. I think I am kind of stuck.
1 Answer 1
Here is my understanding of what you are doing:
- you have some matrices stored in files using your own 'naive' format.
- you read these into your
matrix
structures - two of these matrices are then passed to
multiply()
- you convert one of the matrices into a Compressed Row Storage
- and the other into a Compressed Column Storage
- you then extract the matrices from these compressed forms into arrays of values, one for rows and one for columns
- and finally you multiply them.
I find that complicated!
- Why not store the matrices directly in compressed format?
- Or convert your naive format into the extracted form without the CSC/CSR stage?
- Why use both CSR and CSC? They are both methods of compressing a matrix but they do the same job. I can see no point in using both.
With these defects, there is not much point in reviewing the code in detail, but here are some general points:
- include the right headers and don't cast the return from
malloc
know that
calloc
gives you a zeroed arrayuse
const
where possible on function parameters (eg row/column incompute
, X/Y inmultiply
)use
static
where possible for functionscompute
andmultiply
are badly namedimprove your naming in general. You have four structures all containing fields named
row
andcolumn
. I'm sure at least two of these contain the number of rows/columns (eg.n_rows
). And yourtuple
lists are calledrow
andcolumn
. It all becomes very confusing.pass by reference is better for structures.
your
multiply
does a lot more than just multiply the matrices. It first compresses the input matrices and then expands the compressed matrices. These steps belong elsewhere (if they belong anywhere). Your first two big for-loops shoud be separate functions. The nested-for in each might well belong in a separate function too (difficult to tell, but nested loops are often better split). And the free-loops should also be separate functions. By splitting a big function into several smaller ones (with appropriate names), you make the code easier to read and test.
Sorry to be negative.
-
1\$\begingroup\$ Thanks. I will try to follow your advice and modify my code. Thanks indeed. \$\endgroup\$Boyu Fang– Boyu Fang2013年01月06日 20:29:46 +00:00Commented Jan 6, 2013 at 20:29
tuple
type andmatrixtoCSR
andmatrixtoCSC
are still missing. Also please post amain()
with a simple test case. BTW why do you use a CSC and a CSR? \$\endgroup\$