I have programmed a function to split strings, and it gives the expected output. I am looking forward to write better code and I was told here is a good place to start.
Here's my program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **split(char *str) {
int str_length = 0;
int i = 0;
int separator_count = 0;
while (str[i] != '0円') {
if (str[i] == ' ') {
separator_count++;
}
i++;
}
char **word_array;
word_array = malloc((separator_count + 2) * sizeof(char *));
int word_len = 0;
int separator_index = 0;
int b = 0; //to iterate over "str"
for (int a = 0; a < (separator_count + 2); a++) {
word_len = 0;
while (str_length < strlen(str) + 1) {
str_length++;
if (str[b] == ' ' || str[b] == '0円') {
separator_index = b;
word_array[a] = malloc(word_len * sizeof(char));
int word_depth = 0;
for (int c = (separator_index - word_len); c < separator_index; c++) {
chrctr = str[c];
word_array[a][word_depth] = str[c];
word_depth++;
}
word_array[a][word_len] = '0円'; //terminate found and stored word with null charachter
word_len = 0; //reset word length counter to 0
b++; //skip to next charachter
break; // break to go to next index of word_array
}
word_len++;
b++;
}
}
word_array[separator_count + 2] = NULL;
return word_array;
}
int main() {
char s[] = "A complete toolkit for building search into your product."; //test string
char **ss = split(s);
for (int i = 0; i < sizeof(ss); i++) { //here is the problem, and doing sizeof(ss)/sizeof(ss[0])=1 !!
for (int j = 0; j < strlen (ss[i]); j++) {
printf("%c", ss[i][j]);
}
printf ("\n");
}
}
Please comment with any advice or remarks on how to make this better.
3 Answers 3
Design: Code should explain cases
- Double space:
"abc xyz"
--> Is that to be 2 or 3 tokens? - Leading space:
" abc xyz"
--> Is that to be 2 or 3 tokens? - Trailing space:
"abc xyz "
--> Is that to be 2 or 3 tokens?
Design : Generalization
split()
only considers ' '
. Maybe all white-spaces, '\t'
, '\n'
, ...?
Or pass into the function a list of separators: split(char *str, const char *separators)
Design: const
As data referenced by str
does not change, consider char **split(const char *str)
for greater applicability and self documentation.
Bug: Insufficient memory
// word_array[a] = malloc(word_len * sizeof(char));
word_array[a] = malloc((word_len + 1u) * sizeof(char));
...
word_array[a][word_len] = '0円';
Bug: Writing outside allocation
word_array = malloc((separator_count + 2) * sizeof(char *));
...
word_array[separator_count + 2] = NULL; //OOPS!
// I suspect OP wanted
word_array[separator_count + 1] = NULL;
Allocate to the size of the reference object, not the type
// word_array = malloc((separator_count + 2) * sizeof(char *));
word_array = malloc(sizeof *word_array * (separator_count + 2u));
// word_array[a] = malloc(word_len * sizeof(char));
word_array[a] = malloc(sizeof (word_array[a][0] * word_len);
It is easier to code right, review and maintain.
Leading with the widest types in the multiplication prevents overflow - though that is not so important in this case.
Robust code checks allocation success
ptr = malloc(sizeof *ptr * n);
if (ptr == NULL && n > 0) {
HandleOutOfMemory(); // Some user code
}
Clean-up allocations
main()
called split(s)
which allocates, but failed to call a matching free()
. Not much concern with main()
as code ends anyways, but good practice.
Minor: int word_len
insufficient for large strings
// int word_len
size_t word_len
First, I'll address your issue within the main function. Where you've marked your issue, sizeof(ss)
isn't returning the number of elements, it's returning the size of the ss variable which is a pointer. It's the same as sizeof(char *)
(which is typically 8 on most platforms). It's a little confusing, but know that char s[]
and char *s
aren't the same. What you need is a way to retrieve the number of splits within the returned result. There are a number of ways you can do this, one of which would be to redefine the function signature to include the output and then return an integral type denoting the number of splits:
int split(const char *input, char **output);
However, there is also a way to do this without even knowing how many splits there are. You actually sort of have it implemented, but you don't use it. Similar to how the end of c-strings are detected, you can nullify the final byte of the array to indicate the end of the list (which I see you've already done at the end of the split function - there is an issue that you have with improperly indexing the final element, but I'll address that after). So, instead of your loop terminating at the condition i == num_splits
, you can simply terminate when a null pointer is detected:
int main() {
char s[] = "A complete toolkit for building search into your product."; //test string
char **ss = split(s);
for (int i = 0; ss[i]; i++) {
for (int j = 0; j < strlen (ss[i]); j++) {
printf("%c", ss[i][j]);
}
printf ("\n");
}
}
Making this amendment should fix your issue without having to change anything with the split function.
On the indexing issue you have (in several locations), you're off by one. Remembering that arrays are 0-indexed, the final element of an array is indexed by it's size minus 1. So, the lines
word_array[a][word_len] = '0円';
, and
word_array[separator_count + 2] = NULL;
are indexing past the allocated memory. The first line is actually fine, you just need to fix it in the allocation: word_array[a] = malloc((word_len + 1) * sizeof(char));
The second line should be changed to word_array[separator_count + 1] = NULL;
.
It's also very possible that I've missed some additional errors, but if I find them then I'll edit this answer or hopefully you or another user will notice and address them.
As far as your split function code, there are some things that would improve it a bit. I won't give you an exhaustive list, especially since some suggestions might be more on the subjective side, but here a some that I think you may find helpful:
- This one I believe is simply something you overlooked, but the
chrchr
variable is never declared (used in the linechrctr = str[c];
). I assume it's meant to be discarded anyways since you just usestr[c]
directly in the next line. - Using
strlen
in your while loop generally isn't good practice since the compiler may spit out code that calls it more than necessary. In fact, you can get rid of it entirely since you can retrieve the string length from your first loop where you count the number of separators (you already have the string length within thei
variable). - You can avoid printing each character at a time and, because you've nullified the final byte at the end of each word, print each word using
print("%s\n", ss[i]);
. - There's still more you can do to improve this, but I don't want to overload you with too much information. But I will end with a final note that you should consider a different structure than dynamically allocated arrays within a dynamically allocated array. All this dynamic allocation has to be cleaned up when it's done being used, and managing this memory just adds more complexity and possible complications (not to mention, it's very error prone).
The description, the name of split()
and the actual code are a bit contradicting. What you are doing is an array of pointers to chars (the words).
Since the amount of words is not known, one either has to count them in a first loop (as you did), or then start with a certain size and reallocate()
when the words keep coming. I just use a fixed size.
I found strtok()
which opens another possibility: just replace any delimiter by a 0円
and call it a string / return pointer to start. That means the string itself gets modified. No malloc()
needed for the strings, only to save the pointers (whose number I have limited to 1000).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX 1000
char *words[MAX];
char *delim = " ";
void split(char *str) {
int i;
char *token = strtok(str, delim);
printf("%s %p\n", token, token);
while ((token = strtok(NULL, delim)) != NULL && i < MAX) {
printf("%s %p\n", token, token);
words[i++] = token;
}
}
int main(void) {
char s[] = "A complete toolkit for building search into your product.";
split(s);
printf("Fifth word is '%s' \n", words[5]);
return 0;
}
Output:
A 0x7ffd0ce68220
complete 0x7ffd0ce68222
toolkit 0x7ffd0ce6822b
for 0x7ffd0ce68233
building 0x7ffd0ce68237
search 0x7ffd0ce68240
into 0x7ffd0ce68247
your 0x7ffd0ce6824c
product. 0x7ffd0ce68251
Fifth word is 'into'
Your output is not correct (last line missing), and it is unclear what the goal of the program is.
My output is also incorrect...forgot to save the first token and to take index zero into account: 'into' is the 7th word.
-
\$\begingroup\$ An alternative to
strtok()
is paired use ofstrspn(), strcspn()
. Advantage:char *str
can now beconst char *str
and code does not mess up the caller's, maybe in the middle of, use ofstrok()
. \$\endgroup\$chux– chux2021年03月25日 01:21:02 +00:00Commented Mar 25, 2021 at 1:21 -
\$\begingroup\$ Bug: Try
printf("word is '%s' \n", words[0]);
after callingsplit(s);
. I get "word is '(null)'", not the hoped for "word is 'A'". \$\endgroup\$chux– chux2021年03月25日 01:25:48 +00:00Commented Mar 25, 2021 at 1:25 -
\$\begingroup\$ See existing last line in my answer: 5. vs. 7. word. "forgot to save first token". (before the loop). "Mess up caller": but then you could memcpy() first; more elegant than a separate malloc() for each word. A mere "Dynamic memory allocation" only in the title is not enough information. \$\endgroup\$user239275– user2392752021年03月25日 08:24:26 +00:00Commented Mar 25, 2021 at 8:24
Explore related questions
See similar questions with these tags.