Parsing A Simple Matrix Definition In C

Question 1

In one of my homework projects, I have to parse a matrix definition into a matrix data structure in C. The elements of the matrix will be given in a list of double. I need a simple tokenizer and parser for the task naturally, but I have not much experience in parsing. I implemented the tokenizer and parser. The tokenizer produces a few types of tokens. The list of token types includes tokEot (ie. End of tokens), tokId, tokNumber, tokOparen, tokCparen, tokComma, tokAssign. The matrix definition will be in the form of identifier(Integer, Integer) = ( Real Number, Real Number, ... , Real Number). The input can have more than one matrix definition. The name of the data structure for the tokenizer is scanner. I have no data structure for the parser, implemented it in a single function(parse_matrix). I need also error checking in the parsing stage, so if the order in the definition of a matrix is not be followed, then I must report an error and exit the program. You can look at the supplementary code for more details. The program works without any problems. However, since I need a code review specifically for the function parse_matrix and it might be redundant, I don't want to include the source code of the tokenizer (maybe in another question). Here is the code with the all necessary parts.

typedef struct matrix { 
 char* name; // The name of the matrix, the parsed identifiers will be stored here.
 int m; // The number of rows
 int n; // The number of cols
 double* elements; // A chunk of doubles 
}matrix;
// The enum for the token types
typedef enum token_type
{
 tokEot, // End of tokens
 tokId, // Identifier
 tokNumber, // Either int or double
 tokOparen, // (
 tokCparen, // )
 tokComma, // ,
 tokAssign, // =
 tokCount
}token_type;
typedef struct token_t
{
 token_type type;
 union
 {
 double data_double;
 char* data_string;
 int data_int;
 };
}token_t;
typedef struct scanner
{
 token_t current_token; // Last generated token
 char* input_string; // Input string that contains matrix definitions
 char* current_char; // Current character to be processed.
}scanner;
token_t next_token(scanner* pthis);
matrix* parse_matrix(scanner* pscanner);
void handle_error(const char* error_text);
matrix* new_matrix(char* name, int m, int n);
int main()
{
 char* input = "mat1(3, 3) = ( 1.0, -3.0, 5.0, "
 "2.0, -1.0, 5.0, "
 "1.0, -3.0, 5.0 )";
 
 scanner* float_scanner = new_scanner(input);
 matrix* mat1 = parse_matrix(float_scanner); 
 system("pause");
}
matrix* new_matrix(char* name, int m, int n)
{
 matrix* mat = malloc(sizeof(matrix));
 mat->m = m;
 mat->n = n;
 mat->name = name;
 mat->elements = malloc(sizeof(double) * m * n);
 return mat;
}
// tokId --> ( --> Int --> , --> Int --> ) --> = --> ( --> Double --> , --> Double ... --> ) 
matrix* parse_matrix(scanner* pthis)
{
 matrix* mat;
 int count = 0;
 char* id_name = NULL;
 int m = 0;
 int n = 0;
 token_t pre_token;
 token_t token;
 do
 {
 token = next_token(pthis);
 //If token type is a identifier, then we will try to parse a matrix declaration.
 if (token.type == tokId)
 {
 // Duplicate the token data
 id_name = strdup(token.data_string);
 token = next_token(pthis); 
 // I must have an open parenthesis after an identifier 
 if (token.type == tokOparen)
 {
 token = next_token(pthis);
 // Parsing the dimensions of the matrix
 if (token.type == tokNumber)
 {
 m = token.data_int;
 token = next_token(pthis);
 // There is a comma between the dimensions of matrix 
 if (token.type == tokComma)
 {
 token = next_token(pthis);
 // A number must follow the comma
 if (token.type == tokNumber)
 {
 n = token.data_int;
 token = next_token(pthis);
 if (token.type == tokCparen)
 {
 // Allocate the matrix with the specified dimensions.
 mat = new_matrix(id_name, m, n);
 token = next_token(pthis);
 if (token.type == tokAssign)
 {
 token = next_token(pthis);
 if (token.type == tokOparen)
 {
 token = next_token(pthis);
 // Finally, parse the elements.
 if (token.type == tokNumber)
 *(mat->elements + count++) = token.data_double;
 else
 handle_error("Expected a number after open paranthesis.");
 do
 {
 token = next_token(pthis);
 if (token.type == tokCparen)
 break;
 if (token.type == tokComma)
 {
 token = next_token(pthis);
 if (token.type == tokNumber)
 *(mat->elements + count++) = token.data_double;
 else
 handle_error("Expected a number after a comma.");
 }
 else
 handle_error("Expected a comma after a number.");
 } while (token.type != tokCparen);
 // If the user didn't specify enough elements, we should
 // fill the rest with zero.
 if (count < m * n)
 for (int i = count; i < m * n; i++)
 *(mat->elements + i) = 0.0;
 return mat;
 }
 else
 handle_error("Expected an open parenthesis after an assignment.");
 }
 else
 handle_error("Expected an assigment.");
 }
 else
 handle_error("Expected a close parenthesis.");
 }
 else
 handle_error("Expected a integer number.");
 }
 else
 handle_error("Expected a comma.");
 }
 else
 handle_error("Expected a integer number.");
 }
 else
 handle_error("Expected an open parenthesis.");
 }
 } while (token.type != tokEot);
}
void handle_error(const char* error_text)
{
 perror(error_text);
 exit(-1);
}
```

Question 2

Code looks pretty good. I prefer some other style guide, but this looks consistent, and consistency is important too.

Indentation

Arbitrarily, you can spell the string constant like this:

char* input = 
 "mat1(3, 3) = "
 "( 1.0, -3.0, 5.0, "
 "2.0, -1.0, 5.0, "
 "1.0, -3.0, 5.0 )";

Placing the similar tokens directly one under another can find errors in those tokens in a moment.

Unit tests

You should test your code. There are literally dozens of unit test frameworks for C; and using unit tests will make you immediately find if new changes would break something later. Just do it. Besides, you're already testing the code, but in a much less efficient way - this code has one test in a main() function.

Mixing model and presentation

Data transformations should be some inner job, usually called model, while reporting errors to user is for presentation. The parser would be much more portable and reusable if it wouldn't rely on a specific output system, maybe returning NULL on errors.

`system("pause");`

The worst thing in this code. It relies on specific OS and calls a far more complex program than this one for just one discarded input. Your IDE probably has some tools to capture the output after the program exit, just look for it.

`if-else` staircase

I think this is why you've posted the code here. Well, it may be a good habit to put more likely option into the if branch and less likely into the else; but here the execution stops on handle_error, so it can be rewritten in a much more linear way:

token = next_token(pthis); 
if (token.type != tokOparen)
 handle_error("Expected an open parenthesis.");
token = next_token(pthis);
if (token.type != tokNumber)
 handle_error("Expected a integer number.");

etc. This will make the code not so structured (though it isn't structured now), but much more readable - except for one detail. It's not clear for readers that handle_error breaks the execution. So, maybe, it should be something like

token = next_token(pthis); 
if (token.type != tokOparen)
 return handle_error("Expected an open parenthesis.");
token = next_token(pthis);
if (token.type != tokNumber)
 return handle_error("Expected a integer number.");

This looks much better. Of course, handle_error now should return NULL after exit(-1). Maybe commented out to avoid compiler warnings.

One other option for such situations is a do-while(false); construction:

do { //while(false) - you should always comment this
 do_some_work();
 if(something_wrong())
 break;
 do_some_more_work();
 if(one thing more wrong)
 break;
 do_even_more_work();
 if(still wrong)
 break;
 do_final_work();
} while(false);

Question 3

+1 for your answer. But I cannot upvote since I have not enough reputation. That linear way sounds better!

Question 4

I needed really such a style guide, +1 also for that link.

Pavlo Slavynskyy Pavlo Slavynskyy 2,3746 silver badges11 bronze badges · Accepted Answer · 2021-12-04 21:01:08Z

Code looks pretty good. I prefer some other style guide, but this looks consistent, and consistency is important too.

Indentation

Arbitrarily, you can spell the string constant like this:

char* input = 
 "mat1(3, 3) = "
 "( 1.0, -3.0, 5.0, "
 "2.0, -1.0, 5.0, "
 "1.0, -3.0, 5.0 )";

Placing the similar tokens directly one under another can find errors in those tokens in a moment.

Unit tests

You should test your code. There are literally dozens of unit test frameworks for C; and using unit tests will make you immediately find if new changes would break something later. Just do it. Besides, you're already testing the code, but in a much less efficient way - this code has one test in a main() function.

Mixing model and presentation

Data transformations should be some inner job, usually called model, while reporting errors to user is for presentation. The parser would be much more portable and reusable if it wouldn't rely on a specific output system, maybe returning NULL on errors.

`system("pause");`

The worst thing in this code. It relies on specific OS and calls a far more complex program than this one for just one discarded input. Your IDE probably has some tools to capture the output after the program exit, just look for it.

`if-else` staircase

I think this is why you've posted the code here. Well, it may be a good habit to put more likely option into the if branch and less likely into the else; but here the execution stops on handle_error, so it can be rewritten in a much more linear way:

token = next_token(pthis); 
if (token.type != tokOparen)
 handle_error("Expected an open parenthesis.");
token = next_token(pthis);
if (token.type != tokNumber)
 handle_error("Expected a integer number.");

etc. This will make the code not so structured (though it isn't structured now), but much more readable - except for one detail. It's not clear for readers that handle_error breaks the execution. So, maybe, it should be something like

token = next_token(pthis); 
if (token.type != tokOparen)
 return handle_error("Expected an open parenthesis.");
token = next_token(pthis);
if (token.type != tokNumber)
 return handle_error("Expected a integer number.");

This looks much better. Of course, handle_error now should return NULL after exit(-1). Maybe commented out to avoid compiler warnings.

One other option for such situations is a do-while(false); construction:

do { //while(false) - you should always comment this
 do_some_work();
 if(something_wrong())
 break;
 do_some_more_work();
 if(one thing more wrong)
 break;
 do_even_more_work();
 if(still wrong)
 break;
 do_final_work();
} while(false);

+1 for your answer. But I cannot upvote since I have not enough reputation. That linear way sounds better!

Stack Exchange Network

Parsing A Simple Matrix Definition In C

1 Answer 1

Indentation

Unit tests

Mixing model and presentation

`system("pause");`

`if-else` staircase

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Parsing A Simple Matrix Definition In C

1 Answer 1

Indentation

Unit tests

Mixing model and presentation

system("pause");

if-else staircase

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`system("pause");`

`if-else` staircase