Basic recursive descent parser in C

Question 1

The purpose of this code is to evaluate simple integer expressions that make use of the C arithmetic and bitwise operators while respecting C operator precedence and parenthesis. I wrote this code with portability in mind, making no use of implementation defined, or undefined constructs to my knowledge. However there are a lot of duplicate patterns here, and if someone could provide some insight as to how to rewrite the code more compactly, and confirm that my code is in fact portable standard C, that would be great.

The code works by dividing the expression into groups separated by the lowest precedence operator, and then dividing those groups into groups separated by the second lowest precedence operator, and so on.

#include <stdlib.h>
#include <ctype.h>
#include <errno.h>
#include <setjmp.h>
static jmp_buf jmp;
static unsigned long p6(const char **);
static unsigned long p0(const char **const str) {
 if (isdigit(**str)) return strtoul(*str, (char **)str, 0);
 else switch (*(*str)++) {
 case '+':
 return +p0(str);
 case '-':
 return -p0(str);
 case '~':
 return ~p0(str);
 case '(': {
 const unsigned long r = p6(str);
 if (*(*str)++ == ')') return r;
 }
 }
 longjmp(jmp, errno = EINVAL);
}
static unsigned long p1(const char **const str) {
 unsigned long r = p0(str);
 for (;;) {
 switch (**str) {
 case '*':
 ++*str;
 r *= p0(str);
 continue;
 case '/':
 ++*str;
 r /= p0(str);
 continue;
 case '%':
 ++*str;
 r %= p0(str);
 continue;
 }
 return r;
 }
}
static unsigned long p2(const char **const str) {
 unsigned long r = p1(str);
 for (;;) {
 switch (**str) {
 case '+':
 ++*str;
 r += p1(str);
 continue;
 case '-':
 ++*str;
 r -= p1(str);
 continue;
 }
 return r;
 }
}
static unsigned long p3(const char **const str) {
 unsigned long r = p2(str);
 for (;;) {
 switch (**str) {
 case '<':
 ++*str;
 r <<= p2(str);
 continue;
 case '>':
 ++*str;
 r >>= p2(str);
 continue;
 }
 return r;
 }
}
static unsigned long p4(const char **const str) {
 unsigned long r = p3(str);
 for (; **str == '&'; r &= p3(str)) ++*str;
 return r;
}
static unsigned long p5(const char **const str) {
 unsigned long r = p4(str);
 for (; **str == '^'; r ^= p4(str)) ++*str;
 return r;
}
static unsigned long p6(const char **const str) {
 unsigned long r = p5(str);
 for (; **str == '|'; r |= p5(str)) ++*str;
 return r;
}
unsigned long parse(const char **const str) {
 return setjmp(jmp) ? 0 : p6(str);
}
#include <stdio.h>
int main(const int argc, const char **argv) {
 while (*++argv) {
 const unsigned long r = parse(&(const char *){*argv});
 if (errno) {
 perror(*argv);
 errno = 0;
 continue;
 }
 if (printf("%lu\n", r) < 0)
 return errno;
 }
 return 0;
}

Question 2

Portable?

"I wrote this code with portability in mind," --> Standard C lacks EINVAL.

longjmp(jmp, errno = EINVAL);

Negative char

isdigit(**str) is UB when **str < 0 and not EOF. Best to only pass unsigned char values to is...().

Pedantic: Code assumes argc > 0

If argc == 0, ++argv is a problem.

(Recall portability) Rather than seek the null pointer at the end of the argv[] list, consider indexing.

// int main(const int argc, const char **argv) {
// while (*++argv) {
// const unsigned long r = parse(&(const char *){*argv});
int main(const int argc, const char **argv) {
 for (int i = 1; i < argc; i++) {
 const unsigned long r = parse(&(const char *){argv[i]});

No white-space

I'd expect such parsers to be white-space friendly and not stop due to some ' '. I see how this may over complicate code for this initial effort.

<, > for shifting?

Consider << for shift and < for compare.

Vertical spacing

Some vertical space would improve code readability.

Non-standard main() signature

Drop the const.

Some compilers may whine. Note: "The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program," C17dr § 5.1.2.2.1 2.

// int main(const int argc, const char **argv) {
int main(int argc, char **argv) {

Minor: Unneeded else

// if (isdigit(**str)) return strtoul(*str, (char **)str, 0);
// else switch (*(*str)++) {
if (isdigit(**str)) return strtoul(*str, (char **)str, 0);
switch (*(*str)++) {

right-to-left Bug?

Evaluation of unary plus/minus and logical bitwise complement looks left-to-right. I'd expect right-to-left.

Subtle good code

Code uses parse(&(const char *){*argv}) rather than parse(argv). parse(argv) may modify *argv and that is potential UB.

Minor: errno = 0

Best set just before the parse() call.

With code as is, makes no difference, yet from a template POV, better to see example code setting errno = 0 explicitly.

 errno = 0; // Add
 const unsigned long r = parse(&(const char *){*argv});

printf() and errno

Unclear why code returns errno. printf() is not specified to set errno under any condition.

Consider:

if (printf("%lu\n", r) < 0)
 // return errno;
 return EXIT_FAILURE;

Question 3

argc is never 0. argv[0] is the program name... right?

Question 4

@user16217248 "argc is never 0" --> never is along time. Handle argc equal to 0. Better to code to spec, than only experience.

Question 5

But... why? Is there any rational for this seemingly pointless allowance of argc == 0? Is it really hard to require that implementations with no concept of program name pass a pointer to a null byte for argv[0] and make argc >= 1?

Question 6

Also, since parameters are local to the function they were declared, declaring argc as const is invisible to any implementation.

Question 7

I quoted some of your observations in the review of the latest question.

chux chux 36.2k2 gold badges43 silver badges96 bronze badges · Accepted Answer · 2023-02-27 22:28:04Z

Portable?

"I wrote this code with portability in mind," --> Standard C lacks EINVAL.

longjmp(jmp, errno = EINVAL);

Negative char

isdigit(**str) is UB when **str < 0 and not EOF. Best to only pass unsigned char values to is...().

Pedantic: Code assumes argc > 0

If argc == 0, ++argv is a problem.

(Recall portability) Rather than seek the null pointer at the end of the argv[] list, consider indexing.

// int main(const int argc, const char **argv) {
// while (*++argv) {
// const unsigned long r = parse(&(const char *){*argv});
int main(const int argc, const char **argv) {
 for (int i = 1; i < argc; i++) {
 const unsigned long r = parse(&(const char *){argv[i]});

No white-space

I'd expect such parsers to be white-space friendly and not stop due to some ' '. I see how this may over complicate code for this initial effort.

<, > for shifting?

Consider << for shift and < for compare.

Vertical spacing

Some vertical space would improve code readability.

Non-standard main() signature

Drop the const.

Some compilers may whine. Note: "The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program," C17dr § 5.1.2.2.1 2.

// int main(const int argc, const char **argv) {
int main(int argc, char **argv) {

Minor: Unneeded else

// if (isdigit(**str)) return strtoul(*str, (char **)str, 0);
// else switch (*(*str)++) {
if (isdigit(**str)) return strtoul(*str, (char **)str, 0);
switch (*(*str)++) {

right-to-left Bug?

Evaluation of unary plus/minus and logical bitwise complement looks left-to-right. I'd expect right-to-left.

Subtle good code

Code uses parse(&(const char *){*argv}) rather than parse(argv). parse(argv) may modify *argv and that is potential UB.

Minor: errno = 0

Best set just before the parse() call.

With code as is, makes no difference, yet from a template POV, better to see example code setting errno = 0 explicitly.

 errno = 0; // Add
 const unsigned long r = parse(&(const char *){*argv});

printf() and errno

Unclear why code returns errno. printf() is not specified to set errno under any condition.

Consider:

if (printf("%lu\n", r) < 0)
 // return errno;
 return EXIT_FAILURE;

@user16217248 "argc is never 0" --> never is along time. Handle argc equal to 0. Better to code to spec, than only experience.
But... why? Is there any rational for this seemingly pointless allowance of argc == 0? Is it really hard to require that implementations with no concept of program name pass a pointer to a null byte for argv[0] and make argc >= 1?
Also, since parameters are local to the function they were declared, declaring argc as const is invisible to any implementation.
I quoted some of your observations in the review of the latest question.

Stack Exchange Network

Basic recursive descent parser in C

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Basic recursive descent parser in C

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions