The purpose of this code is to evaluate simple integer expressions that make use of the C arithmetic and bitwise operators while respecting C operator precedence and parenthesis. I wrote this code with portability in mind, making no use of implementation defined, or undefined constructs to my knowledge. However there are a lot of duplicate patterns here, and if someone could provide some insight as to how to rewrite the code more compactly, and confirm that my code is in fact portable standard C, that would be great.
The code works by dividing the expression into groups separated by the lowest precedence operator, and then dividing those groups into groups separated by the second lowest precedence operator, and so on.
#include <stdlib.h>
#include <ctype.h>
#include <errno.h>
#include <setjmp.h>
static jmp_buf jmp;
static unsigned long p6(const char **);
static unsigned long p0(const char **const str) {
if (isdigit(**str)) return strtoul(*str, (char **)str, 0);
else switch (*(*str)++) {
case '+':
return +p0(str);
case '-':
return -p0(str);
case '~':
return ~p0(str);
case '(': {
const unsigned long r = p6(str);
if (*(*str)++ == ')') return r;
}
}
longjmp(jmp, errno = EINVAL);
}
static unsigned long p1(const char **const str) {
unsigned long r = p0(str);
for (;;) {
switch (**str) {
case '*':
++*str;
r *= p0(str);
continue;
case '/':
++*str;
r /= p0(str);
continue;
case '%':
++*str;
r %= p0(str);
continue;
}
return r;
}
}
static unsigned long p2(const char **const str) {
unsigned long r = p1(str);
for (;;) {
switch (**str) {
case '+':
++*str;
r += p1(str);
continue;
case '-':
++*str;
r -= p1(str);
continue;
}
return r;
}
}
static unsigned long p3(const char **const str) {
unsigned long r = p2(str);
for (;;) {
switch (**str) {
case '<':
++*str;
r <<= p2(str);
continue;
case '>':
++*str;
r >>= p2(str);
continue;
}
return r;
}
}
static unsigned long p4(const char **const str) {
unsigned long r = p3(str);
for (; **str == '&'; r &= p3(str)) ++*str;
return r;
}
static unsigned long p5(const char **const str) {
unsigned long r = p4(str);
for (; **str == '^'; r ^= p4(str)) ++*str;
return r;
}
static unsigned long p6(const char **const str) {
unsigned long r = p5(str);
for (; **str == '|'; r |= p5(str)) ++*str;
return r;
}
unsigned long parse(const char **const str) {
return setjmp(jmp) ? 0 : p6(str);
}
#include <stdio.h>
int main(const int argc, const char **argv) {
while (*++argv) {
const unsigned long r = parse(&(const char *){*argv});
if (errno) {
perror(*argv);
errno = 0;
continue;
}
if (printf("%lu\n", r) < 0)
return errno;
}
return 0;
}
1 Answer 1
Portable?
"I wrote this code with portability in mind," --> Standard C lacks EINVAL
.
longjmp(jmp, errno = EINVAL);
Negative char
isdigit(**str)
is UB when **str < 0
and not EOF
. Best to only pass unsigned char
values to is...()
.
Pedantic: Code assumes argc > 0
If argc == 0
, ++argv
is a problem.
(Recall portability) Rather than seek the null pointer at the end of the argv[]
list, consider indexing.
// int main(const int argc, const char **argv) {
// while (*++argv) {
// const unsigned long r = parse(&(const char *){*argv});
int main(const int argc, const char **argv) {
for (int i = 1; i < argc; i++) {
const unsigned long r = parse(&(const char *){argv[i]});
No white-space
I'd expect such parsers to be white-space friendly and not stop due to some ' '
. I see how this may over complicate code for this initial effort.
<
, >
for shifting?
Consider <<
for shift and <
for compare.
Vertical spacing
Some vertical space would improve code readability.
Non-standard main()
signature
Drop the const
.
Some compilers may whine. Note: "The parameters argc
and argv
and the strings pointed to by the argv
array shall be modifiable by the program," C17dr § 5.1.2.2.1 2.
// int main(const int argc, const char **argv) {
int main(int argc, char **argv) {
Minor: Unneeded else
// if (isdigit(**str)) return strtoul(*str, (char **)str, 0);
// else switch (*(*str)++) {
if (isdigit(**str)) return strtoul(*str, (char **)str, 0);
switch (*(*str)++) {
right-to-left Bug?
Evaluation of unary plus/minus and logical bitwise complement looks left-to-right. I'd expect right-to-left.
Subtle good code
Code uses parse(&(const char *){*argv})
rather than parse(argv)
. parse(argv)
may modify *argv
and that is potential UB.
Minor: errno = 0
Best set just before the parse()
call.
With code as is, makes no difference, yet from a template POV, better to see example code setting errno = 0
explicitly.
errno = 0; // Add
const unsigned long r = parse(&(const char *){*argv});
printf()
and errno
Unclear why code returns errno
. printf()
is not specified to set errno
under any condition.
Consider:
if (printf("%lu\n", r) < 0)
// return errno;
return EXIT_FAILURE;
-
\$\begingroup\$
argc
is never 0.argv[0]
is the program name... right? \$\endgroup\$CPlus– CPlus2023年02月27日 23:19:18 +00:00Commented Feb 27, 2023 at 23:19 -
\$\begingroup\$ @user16217248 "
argc
is never 0" --> never is along time. Handleargc
equal to 0. Better to code to spec, than only experience. \$\endgroup\$chux– chux2023年02月28日 00:01:50 +00:00Commented Feb 28, 2023 at 0:01 -
\$\begingroup\$ But... why? Is there any rational for this seemingly pointless allowance of
argc == 0
? Is it really hard to require that implementations with no concept of program name pass a pointer to a null byte forargv[0]
and makeargc >= 1
? \$\endgroup\$CPlus– CPlus2023年03月01日 15:52:00 +00:00Commented Mar 1, 2023 at 15:52 -
\$\begingroup\$ Also, since parameters are local to the function they were declared, declaring
argc
asconst
is invisible to any implementation. \$\endgroup\$CPlus– CPlus2023年03月01日 15:56:52 +00:00Commented Mar 1, 2023 at 15:56 -
\$\begingroup\$ I quoted some of your observations in the review of the latest question. \$\endgroup\$2023年03月01日 16:09:05 +00:00Commented Mar 1, 2023 at 16:09
Explore related questions
See similar questions with these tags.