I stumbled upon a question on SO asking how to split a comma-separated string into individual values.
Since it's been a while since I've had any good reason to write C I'd like to ask for some feedback.
#include <stdio.h>
#include <string.h>
int main (int argc, const char * argv[]) {
const char str[] ="coma separated,input,,,some fields,,empty";
const char tok[] = ",";
char * tmp = (char *)str;
size_t count;
for (count=0; tmp[count]; tmp[count] == tok[0] ? count++ : * tmp++) {
//Empty loop body.
}
tmp = (char *)str;
for (size_t i = 0, l = 0; i < count; i++) {
l = strcspn (tmp, tok);
if (l == 0) {
printf("\"\"\n");
} else {
printf("\"%.*s\"\n", l, tmp);
}
tmp += sizeof(char) * (l + 1);
}
printf("\"%s\"\n", tmp);
return 0;
}
Output as expected:
"coma separated"
"input"
""
""
"some fields"
""
"empty"
3 Answers 3
sizeof(char)
by definition is 1.tmp += l + 1;
would be as good.I don't see a need to special case
l == 0
.You don't have to initialize
l
in a loop header: it is immediately reassigned.gcc
complains about precision specifier inprintf("\"%.*s\"\n", l, tmp)
beingsize_t
; unfortunately, cast is unavoidable.I don't see a need for 2 separate loops. A single
char * tmp = (char *)str; do { int l = strcspn (tmp, tok); printf("\"%.*s\"\n", l, tmp); tmp += l + 1; } while (tmp[-1]); return 0;
works equally well.
-
\$\begingroup\$ Thanks, I was wondering if I could avoid looping twice, much appreciated. \$\endgroup\$Etheryte– Etheryte2014年11月17日 06:37:13 +00:00Commented Nov 17, 2014 at 6:37
-
\$\begingroup\$ "
tmp += l + 1
; would be as good." astmp
is of pointer type that would be even better. Actually iftmp
was a pointer toint
(and all occurrences ofchar
were replaced withint
) as the code is now it would be incorrect! For more info search for materials about pointer arithmetic. \$\endgroup\$elmo– elmo2014年11月17日 10:52:37 +00:00Commented Nov 17, 2014 at 10:52
Solution
As @vnp noted, the "correct" solution is much simpler. I would make a few changes and write it as:
#include <stdio.h>
#include <string.h>
int main() {
const char string[] = "comma separated,input,,,some fields,,empty";
const char delims[] = ",";
const char *s = string;
do {
size_t field_len = strcspn(s, delims);
printf("\"%.*s\"\n", (int)field_len, s);
s += field_len;
} while (*s++);
}
Notable changes include:
- Friendlier variable names
- Don't cast away the
const
- Ensure that
printf()
gets anint
for the field width specifier - Smarter loop termination test (check for the NUL terminator then increment the pointer)
Nasty loop
Your original solution contains a loop that is particularly convoluted:
char * tmp = (char *)str; size_t count; for (count=0; tmp[count]; tmp[count] == tok[0] ? count++ : * tmp++) { //Empty loop body. } tmp = (char *)str;
As discussed previously,
- This loop, whose goal is to set
count
to the number of commas, turns out to be completely unnecessary. - Casting away
const
is bad.
In addition,
The loop manipulates both
count
andtmp
, which makes it very confusing. When the loop terminates,tmp
points tocount
bytes before the end of the string. That's just weird!The non-weird version would be:
int count = 0; for (const char *tmp = str; *tmp; tmp++) { if (*tmp == tok[0]) count++; }
Note the very conventional loop header: start at the beginning of
str
, walk one byte at a time until NUL is reached. Also,tmp
is truly temporary, and falls out of scope after the loop, so you don't have to worry about resetting it. The loop body is straightforward as well: incrementcount
for each comma.* tmp++
is a superfluous pointer dereference that just adds confusion and noise.
I'd avoid having l
as a variable name, also considering that variables shouldn't be named with just one character (excluding loop counter variables). It can be easy for a human to mistake it as 1
(one), although the compiler understands it.
Also, there is a typo in str[]
("coma" should be "comma").
,
and"
values? en.wikipedia.org/wiki/Comma-separated_values \$\endgroup\$