5
\$\begingroup\$

I'm new to C language and want to explode a string like we do in PHP explode() function, I searched for a built-in function with the C standard library, and I found strtok , but It doesn't support empty tokens like 1,2,3,,5 . Inspired by the answers I found in this SO question I made this function, it is supposed to be thread safe and support empty tokens and doesn't change the original string

char* strTok(char** newString, char* delimiter)
{
 char* string = *newString;
 char* delimiterFound = (char*) 0;
 int tokLenght = 0;
 char* tok = (char*) 0;
 if(!string) return (char*) 0;
 delimiterFound = strstr(string, delimiter);
 if(delimiterFound){
 tokLenght = delimiterFound-string;
 }else{
 tokLenght = strlen(string);
 }
 tok = malloc(tokLenght + 1);
 memcpy(tok, string, tokLenght);
 tok[tokLenght] = '0円';
 *newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0;
 return tok;
}

I designed it to be used like

char* input = "1,2,3,4,5,6,7,,,10,";
char** inputP = &input;
char* tok;
while( (tok=strTok(inputP, ",")) ){
 printf("%s\n", tok);
}
asked Apr 6, 2019 at 0:03
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Better user-interface then the original strtok. You may be interested in strsep, too. code.woboq.org/userspace/glibc/string/strsep.c.html \$\endgroup\$ Commented Apr 6, 2019 at 1:59
  • \$\begingroup\$ @NeilEdelman thanks I never saw this function before, I will check it. \$\endgroup\$ Commented Apr 6, 2019 at 2:09
  • 1
    \$\begingroup\$ It's not in the standard C libraries, but in POSIX, (any type of gcc.) However, like strtok, it obliterates the char to replace it with 0円, so it's not the same. \$\endgroup\$ Commented Apr 6, 2019 at 2:17

2 Answers 2

5
\$\begingroup\$
  • delimiterFound + strlen(delimiter) sounds like a bug. If the delimiter is longer than one character, *newString will point too far into the original, maybe even beyond the end. Correct me if I am wrong, delimiterFound + 1 is what you are actually after.

  • Modern C allows, and strongly encourages, to declare variables as close to their use a possible. Consider

    char * delimiterFound = strstr(string, delimiter);
    ....
    char * tok = malloc(tokLenght + 1);
    

    etc.

  • Always test that malloc didn't fail.

  • More spaces - around keywords, braces, etc - definitely improve readability:

     if (....) {
     ....
     } else {
     ....
     }
    
answered Apr 6, 2019 at 1:24
\$\endgroup\$
6
  • 1
    \$\begingroup\$ Thaaank you very much for these precious points, regarding the delimiter length bug, ummmm, I want to support long delimiters more than 1 characters like the boundary string in http requests that has content-type multi-part, and I don't think it's a bug because delimiterFound + strlen(delimiter) can never be after the 0 byte that terminates the original string!, right ? "fooDELIMITER" 3 + 12 \$\endgroup\$ Commented Apr 6, 2019 at 1:41
  • \$\begingroup\$ @Accountantم Long delimiters here refer to, say ",;.", in where any character delimits the string on its own right. \$\endgroup\$ Commented Apr 6, 2019 at 1:52
  • \$\begingroup\$ I didn't get it, I'm sorry, can you please give me an example input that can break this code, exploiting this bug ? \$\endgroup\$ Commented Apr 6, 2019 at 2:03
  • 1
    \$\begingroup\$ @Accountantم Sorry for not being clear. I should realize that your intentions are different (and read man strstr more carefully). Consider it my blinder - since you mentioned strtok, I expected the strtok semantics. \$\endgroup\$ Commented Apr 6, 2019 at 2:09
  • \$\begingroup\$ It's my fault because I said it's strtok, I wanted strtok that can support delimiters more than 1 characters, because I need this feature a lot. So do you mean it's not a bug ?? \$\endgroup\$ Commented Apr 6, 2019 at 2:14
4
\$\begingroup\$

From a readability viewpoint, you should use NULL instead of (char*) 0 as it is easier to recognize what you're trying to do. Also, the tokLenght misspells "length", and should probably be tokLength.

You leak memory, as the memory allocated to hold the returned string is never freed.

answered Apr 6, 2019 at 0:57
\$\endgroup\$
4
  • \$\begingroup\$ Thank you very much I will use NULL from now on, and I will remember to free() memory , 'I miss PHP garbage collector :(', I didn't get the tokLength spelling note, aren't they the same ? \$\endgroup\$ Commented Apr 6, 2019 at 1:42
  • 1
    \$\begingroup\$ It's a matter of style, and including .h if you want to use NULL. However, you don't have to cast (char *)0, just use 0 (or NULL.) It knows from the return type. \$\endgroup\$ Commented Apr 6, 2019 at 2:03
  • 1
    \$\begingroup\$ @Accountantم For the spelling, you used G-H-T when the correct spelling is G-T-H (the last two letters are swapped). I've made that typo before. I find having identifiers spelled correctly helps with reading and finding them, although the autocomplete in IDEs mitigates that a little but propagates the misspellings. \$\endgroup\$ Commented Apr 6, 2019 at 14:12
  • \$\begingroup\$ @1201ProgramAlarm ooh, 😮 how come I didn't notice this after revising it multiple times!, thanks. \$\endgroup\$ Commented Apr 6, 2019 at 14:38

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.