I'm new to C language and want to explode a string like we do in PHP explode()
function, I searched for a built-in function with the C standard library, and I found strtok
, but It doesn't support empty tokens like 1,2,3,,5
. Inspired by the answers I found in this SO question I made this function, it is supposed to be thread safe and support empty tokens and doesn't change the original string
char* strTok(char** newString, char* delimiter)
{
char* string = *newString;
char* delimiterFound = (char*) 0;
int tokLenght = 0;
char* tok = (char*) 0;
if(!string) return (char*) 0;
delimiterFound = strstr(string, delimiter);
if(delimiterFound){
tokLenght = delimiterFound-string;
}else{
tokLenght = strlen(string);
}
tok = malloc(tokLenght + 1);
memcpy(tok, string, tokLenght);
tok[tokLenght] = '0円';
*newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0;
return tok;
}
I designed it to be used like
char* input = "1,2,3,4,5,6,7,,,10,";
char** inputP = &input;
char* tok;
while( (tok=strTok(inputP, ",")) ){
printf("%s\n", tok);
}
2 Answers 2
delimiterFound + strlen(delimiter)
sounds like a bug. If the delimiter is longer than one character,*newString
will point too far into the original, maybe even beyond the end. Correct me if I am wrong,delimiterFound + 1
is what you are actually after.Modern C allows, and strongly encourages, to declare variables as close to their use a possible. Consider
char * delimiterFound = strstr(string, delimiter); .... char * tok = malloc(tokLenght + 1);
etc.
Always test that
malloc
didn't fail.More spaces - around keywords, braces, etc - definitely improve readability:
if (....) { .... } else { .... }
-
1\$\begingroup\$ Thaaank you very much for these precious points, regarding the delimiter length bug, ummmm, I want to support long delimiters more than 1 characters like the boundary string in http requests that has content-type multi-part, and I don't think it's a bug because
delimiterFound + strlen(delimiter)
can never be after the 0 byte that terminates the original string!, right ?"fooDELIMITER" 3 + 12
\$\endgroup\$Accountant م– Accountant م2019年04月06日 01:41:09 +00:00Commented Apr 6, 2019 at 1:41 -
\$\begingroup\$ @Accountantم Long delimiters here refer to, say
",;."
, in where any character delimits the string on its own right. \$\endgroup\$vnp– vnp2019年04月06日 01:52:07 +00:00Commented Apr 6, 2019 at 1:52 -
\$\begingroup\$ I didn't get it, I'm sorry, can you please give me an example input that can break this code, exploiting this bug ? \$\endgroup\$Accountant م– Accountant م2019年04月06日 02:03:11 +00:00Commented Apr 6, 2019 at 2:03
-
1\$\begingroup\$ @Accountantم Sorry for not being clear. I should realize that your intentions are different (and read
man strstr
more carefully). Consider it my blinder - since you mentionedstrtok
, I expected thestrtok
semantics. \$\endgroup\$vnp– vnp2019年04月06日 02:09:11 +00:00Commented Apr 6, 2019 at 2:09 -
\$\begingroup\$ It's my fault because I said it's
strtok
, I wantedstrtok
that can support delimiters more than 1 characters, because I need this feature a lot. So do you mean it's not a bug ?? \$\endgroup\$Accountant م– Accountant م2019年04月06日 02:14:12 +00:00Commented Apr 6, 2019 at 2:14
From a readability viewpoint, you should use NULL
instead of (char*) 0
as it is easier to recognize what you're trying to do. Also, the tokLenght
misspells "length", and should probably be tokLength
.
You leak memory, as the memory allocated to hold the returned string is never freed.
-
\$\begingroup\$ Thank you very much I will use NULL from now on, and I will remember to
free()
memory , 'I miss PHP garbage collector :(', I didn't get thetokLength
spelling note, aren't they the same ? \$\endgroup\$Accountant م– Accountant م2019年04月06日 01:42:49 +00:00Commented Apr 6, 2019 at 1:42 -
1\$\begingroup\$ It's a matter of style, and including
.h
if you want to useNULL
. However, you don't have to cast(char *)0
, just use0
(orNULL
.) It knows from the return type. \$\endgroup\$Neil– Neil2019年04月06日 02:03:37 +00:00Commented Apr 6, 2019 at 2:03 -
1\$\begingroup\$ @Accountantم For the spelling, you used G-H-T when the correct spelling is G-T-H (the last two letters are swapped). I've made that typo before. I find having identifiers spelled correctly helps with reading and finding them, although the autocomplete in IDEs mitigates that a little but propagates the misspellings. \$\endgroup\$1201ProgramAlarm– 1201ProgramAlarm2019年04月06日 14:12:06 +00:00Commented Apr 6, 2019 at 14:12
-
\$\begingroup\$ @1201ProgramAlarm ooh, 😮 how come I didn't notice this after revising it multiple times!, thanks. \$\endgroup\$Accountant م– Accountant م2019年04月06日 14:38:08 +00:00Commented Apr 6, 2019 at 14:38
strtok
. You may be interested instrsep
, too. code.woboq.org/userspace/glibc/string/strsep.c.html \$\endgroup\$C
libraries, but inPOSIX
, (any type ofgcc
.) However, likestrtok
, it obliterates thechar
to replace it with0円
, so it's not the same. \$\endgroup\$