Read consecutive blanks in array

Question 1

I am writing a function that reads all consecutive blanks in a string, moves the pointer to the next non-blank char, and return a struct "token" that contains the blanks it found.

The main function is defined as

int match_ws(const char** p, struct token* tok)

where p points to the beginning of the strings to analyze, and tok will contain the result after the function returns. Return value is either 1 (successful), or 0.

[EDIT]

match_ws starts analyzing the string given as first input, and reads all consecutive whitespaces, while advancing the pointer too. It stores all blanks it found in the struct (second input) and returns 1. If it didn't find any blank, or too many, returns 0 without modifying tok.

So, if the input is const char* p = " . +" then after match_ws ran the tok structure will contain lexeme = " " (the four spaces at the beginning of p, and type = WS (enum type for "whitespace"). If the input is const char* p = ". ; )" then the function just returns 0 without modifying tok, because **p is not a whitespace (it's a dot).

Overall, this is part of a parser, and match_ws is the function that "consumes" all whitespaces -- not relevant for the parser. I believe the function behaves correctly, but since I am not experienced with the C language, I would like to have reviews about how I handled allocation of memory, pointer, pointer to pointers, function definition, etc.

[/EDIT]

Overall, my questions are about correct usage of pointers, pointers-to-pointers, good choice in the type of arguments (for the function definitions), and memory management.

I wrote my questions in the comments, but I am not so used to memory management and coding standards in C, so would appreciate a code review and all suggestions welcome.

I also added some boilerplate code so the file can be compiled and executed. As you can guess is part of a larger code, but I reduced it to be a minimal example.

#include <stdio.h>
#include <stdlib.h>
#define MAX_LINE 100
/*
 * Some boilerplate code
*/
enum token_type {
 WS, // white spaces
 Eof
};
struct token{
 char* lexeme;
 enum token_type type;
};
char consume(const char** p);
/*
 * Main function to work on.
 * Questions:
 * - char ws[MAX_LINE] or using malloc?
 * - is it OK to initialize char ws with "" ?
 * - Am I appending char into ws in a optimal way?
 * - is it OK to return int and modify input argument tok, or should create a new tok and return it?
 * - makes sense to have pointer-to-pointer as input? Else what?
*/
int match_ws(const char** p, struct token* tok) {
 /*
 * Find all consecutives blanks starting from position p
 * Modify the input struct token.
 * Return 1 if execution was successful, 0 otherwise.
 */
 if ( **p != ' ' &&
 **p != '\t' &&
 **p != '\r' &&
 **p != '\n')
 {
 printf("Bad call to whitespace!!");
 return 0;
 }
 int i = 0;
 char c;
 char ws[MAX_LINE] = "";
 do {
 if (i >= MAX_LINE) {
 printf("Found more than %d white spaces", MAX_LINE);
 return 0;
 }
 c = consume(p);
 printf("WHITESPACE : Current char is |%c|\n", c);
 ws[i++] = c;
 }
 // Here p has been already advanced by the call to consume fn
 while (**p == ' ' || **p == '\t' || **p == '\r' || **p == '\n');
 tok->lexeme = ws;
 tok->type = WS;
 return 1;
}
/*
 * More boilerplate code to compile
*/
const char* type2char (enum token_type t) {
 switch (t)
 {
 case WS: return "WS";
 default: return "UNK";
 }
}
void print_token(struct token* p) {
 printf("<%s, %s>\n", p->lexeme, type2char(p->type));
}
char consume(const char** p) {
 // Advances the pointer while reading the input
 char c = **p;
 (*p)++;
 return c;
}
/*
 * Main.
 * Questions:
 * - is it OK to pass arguments to match_ws via '&' notation?
 * - Makes sense to have const char* line? Or just char* ?
 * - Should tok be initialized somehow?
 * - Is there anything that has to be free() eventually?
*/
int main() {
 const char* line = " +";
 struct token tok;
 match_ws(&line, &tok);
 print_token(&tok);
}

Question 2

I find it hard to understand the more exact outcome of match_ws interface to look deeper into the code. Possibly if you add some examples get a better grasp of the interface intention.

Question 3

You seem to have registered with two different accounts; you'll need to flag for moderator attention to have your accounts merged.

Question 4

Answers to Questions

char ws[MAX_LINE] or using malloc?

The function match_ws() should return a pointer to a struct token. The token should be allocated by the match_ws() function, the contents of the token should also use malloc() to allocate the string that token->lexeme points to. Right now the code returns the address of a variable on the stack, this is not safe and can lead to undefined behavior (UB).

is it OK to initialize char ws with "" ?

It is okay to initialize an array of char with "". To initialize a single character use single quotes instead.

Am I appending char into ws in a optimal way?

Using pointers would be more optimal.

is it OK to return int and modify input argument tok, or should create a new tok and return it?

See the answer to the first question.

makes sense to have pointer-to-pointer as input? Else what?

It would be better to use a single pointer rather than a double pointer. It is possible to corrupt the input when it is a double pointer.

is it OK to pass arguments to match_ws via '&' notation?

It is OK to pass the the token argument using the & notation if you don't want to allocate a token in the function, it would be better to pass line without the & notation.

Makes sense to have const char* line? Or just char* ?

The use of const is appropriate here.

Should tok be initialized somehow?

The C programming language is a very unforgiving programming language: all variables should be initialized.

Is there anything that has to be free() eventually?

Not in the current implementation.

Don't Reinvent the Wheel

Checking the values of input characters is very common in the C programming language; a function is provided for checking whitespace, and another function is provided for checking alphanumeric characters. What you do need to do is include <ctype.h>. The advantage of doing this is that the current code probably misses some white space characters that isspace() would find.

Code Organization

Function prototypes are very useful in large programs that contain multiple source files, and that in case they will be in header files. In a single file program like this it is better to put the main() function at the bottom of the file and all the functions that get used in the proper order above main(). Keep in mind that every line of code written is another line of code where a bug can crawl into the code.

pacmaninbw ♦pacmaninbw 26.1k13 gold badges47 silver badges114 bronze badges · Answer 1 · 2022-07-15 21:19:44Z

Answers to Questions

char ws[MAX_LINE] or using malloc?

The function match_ws() should return a pointer to a struct token. The token should be allocated by the match_ws() function, the contents of the token should also use malloc() to allocate the string that token->lexeme points to. Right now the code returns the address of a variable on the stack, this is not safe and can lead to undefined behavior (UB).

is it OK to initialize char ws with "" ?

It is okay to initialize an array of char with "". To initialize a single character use single quotes instead.

Am I appending char into ws in a optimal way?

Using pointers would be more optimal.

is it OK to return int and modify input argument tok, or should create a new tok and return it?

See the answer to the first question.

makes sense to have pointer-to-pointer as input? Else what?

It would be better to use a single pointer rather than a double pointer. It is possible to corrupt the input when it is a double pointer.

is it OK to pass arguments to match_ws via '&' notation?

It is OK to pass the the token argument using the & notation if you don't want to allocate a token in the function, it would be better to pass line without the & notation.

Makes sense to have const char* line? Or just char* ?

The use of const is appropriate here.

Should tok be initialized somehow?

The C programming language is a very unforgiving programming language: all variables should be initialized.

Is there anything that has to be free() eventually?

Not in the current implementation.

Don't Reinvent the Wheel

Checking the values of input characters is very common in the C programming language; a function is provided for checking whitespace, and another function is provided for checking alphanumeric characters. What you do need to do is include <ctype.h>. The advantage of doing this is that the current code probably misses some white space characters that isspace() would find.

Code Organization

Function prototypes are very useful in large programs that contain multiple source files, and that in case they will be in header files. In a single file program like this it is better to put the main() function at the bottom of the file and all the functions that get used in the proper order above main(). Keep in mind that every line of code written is another line of code where a bug can crawl into the code.

Stack Exchange Network

Read consecutive blanks in array

1 Answer 1

Answers to Questions

Don't Reinvent the Wheel

Code Organization

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Read consecutive blanks in array

1 Answer 1

Answers to Questions

Don't Reinvent the Wheel

Code Organization

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions