I wrote a function that joins a collection of Strings with a delimiter. It's based on Java's version of the function. I'm assuming C has some similar built-in.
Example:
int main() {
char* coll[] = {"Hello", "World", "This", "Should", "Be", "Joined!"};
char* joined = awfulJoinStr(", ", coll, 6);
printf("%s\n", joined);
// Prints "Hello, World, This, Should, Be, Joined!"
}
Basically how it works is, it alternates between iterating the collection strings and the delimiter. It writes the current word to the buffer until it hits a null terminator, then it switches to either writing the next word, or to the delimiter.
I'm new to C, so I'd like comments on anything in the code. My main concerns:
It's super inefficient. All the strings need to be iterated at the start to get a total length so I know how big of a buffer to allocate later. I can't see how else I'd do it though unless I do reallocations while writing, but that seems like it would be even more inefficient.
It requires three iteration counters (
stringI
,charI
, andbufferI
) and a flag. I need to keep track of the current word that I'm writing, my position in it, and need to track where I am in the buffer. I also need a flag to track when I'm currently writing a delimiter or word.It requires passing in how many strings I want to join (
nStrings
). This seems like it's necessary as well since I can't know the length ofstrArr
, but it's unfortunate.It's huge. It doesn't fit in one screen at a comfortable zoom.
I also included my version of strlen
for kicks, and because I used it instead of the built-in.
#include <stdlib.h>
#include <stdbool.h>
size_t myStrLen(char* str) {
size_t length = 0;
while (str[length] != '0円') {
length++;
}
return length;
}
char* awfulJoinStr(char* delimiter, char** strArr, size_t nStrings) {
size_t delimiterSize = myStrLen(delimiter);
size_t totalDelimiterSize = (nStrings - 1) * delimiterSize;
size_t totalStringSize = 0;
for (size_t i = 0; i < nStrings; i++) {
totalStringSize += myStrLen(strArr[i]);
}
size_t bufferSize = totalDelimiterSize + totalStringSize + 1;
char* buffer = malloc(sizeof(char) * bufferSize);
buffer[bufferSize - 1] = '0円';
bool inDelim = false;
size_t charI = 0;
size_t stringI = 0;
for (size_t bufferI = 0; bufferI < bufferSize - 1;) {
char current = inDelim ? delimiter[charI] : strArr[stringI][charI];
if (current == '0円') {
// Start at the beginning of the word and toggle what we're writing
charI = 0;
if (!inDelim) {
stringI++;
}
inDelim = !inDelim;
} else {
buffer[bufferI] = current;
charI++;
bufferI++;
}
}
return buffer;
}
2 Answers 2
I see a number of things that may help you improve your code.
Use standard library functions where appropriate
Except as a learning exercise (which I recognize this probably is), there's little reason to write duplicates of existing, well-tested, well-documented and often highly optimized library functions. For that reason, in real code, I'd expect strlen()
to be used rather than myStrLen()
.
Don't leak memory
Unlike Java, in C it is the responsibility of the programmer to clean up memory allocations. For that reason, I'd expect the sample code to have the line free(joined);
before the end of main
.
Use const
where appropriate
The delimiter
and strArr
arguments to awfulJoinStr
are not and should not be modified by that function, so it would be better to indicate that fact and change the parameter list to this:
char* awfulJoinStr(const char* delimiter, const char** strArr, size_t nStrings)
Consider using a sentinel value
Generally, there are two ways to tell a function the size of a list. Either it can be passed explicitly as you do in the current code, or one can use a sentinel value which is simply a special "end of list" marker value. In this case, I think it might be more convenient to use a sentinel value of NULL
. So your coll
could look like this:
const char* coll[] = {"Hello", "World", "This", "Should", "Be", "Joined!", NULL};
And now the code could look for the special NULL
value instead of having to pass a count.
Check for errors
The call to malloc
could fail. The only indication you would have of this would be that it would return NULL
, so for that reason, you should check for a NULL
return from malloc
before using the returned pointer or risk having your code crash.
Also, if nStrings
is zero, the calculated totalDelimiterSize
is huge and the result will certainly not be what you want, even if it doesn't crash.
Simplify code by knowing the standard well
The current code includes this line:
char* buffer = malloc(sizeof(char) * bufferSize);
However, the standard defines sizeof(char)
as always being equal to 1
so this could and should be written like this instead:
char* buffer = malloc(bufferSize);
Bail out early on error
If the parameters to the function are faulty, or if there are no passed strings, it would probably make most sense to bail out early, rather than doing a bunch of work that would ultimately be thrown away.
Understand the use of pointers
A fundamental (but not always easy!) concept in C programming is the pointer. Understanding how to use pointers effectively is an important skill to learn when mastering C programming. Examples as applied to this program are shown in the code in the next section.
Simplify your code by decomposing into functions
Here's one way to rewrite the function using most of what's suggested above:
char* awfulJoinStr(const char* delimiter, const char** strArr) {
if (strArr == NULL || delimiter == NULL) {
return NULL;
}
// first calculate the total length of the strings
size_t bufferSize = 0;
size_t nStrings = 0;
for (const char **str = strArr; *str != NULL; ++str) {
bufferSize += strlen(*str);
++nStrings;
}
if (nStrings == 0) {
return NULL;
}
size_t delimiterSize = strlen(delimiter);
bufferSize += (nStrings - 1) * delimiterSize + 1;
char* buffer = malloc(bufferSize);
if (buffer == NULL) {
return NULL;
}
buffer[bufferSize - 1] = '0円';
for (char *curr = scopy(buffer, *strArr++); *strArr; ++strArr) {
curr = scopy(curr, delimiter);
curr = scopy(curr, *strArr);
}
return buffer;
}
As you see, the copying is done via a custom string copy function scopy
. We use this instead ofthe standard strcpy
because strcpy
returns the start of the destination string, but this code requires one past the end instead. Here's scopy
:
char *scopy(char *dest, const char *src) {
while (*src) {
*dest++ = *src++;
}
return dest;
}
This short function uses a lot of specialized C knowledge. First, the passed parameters are both pointers. We can alter the pointers without necesarily altering what they're pointing to. Second is the idea of post-increment versus pre-increment. This line is a very common C idiom:
*dest++ = *src++;
What it means in English is, in effect, "get the thing that src
is pointing to and copy that to wherever dest
is pointing; then increment both pointers." (That's not exactly the sequence of events, but it will suffice to understand the intent of this line.)
Regarding:
char* buffer = malloc(sizeof(char) * bufferSize);
Always check (!=NULL) the returned value to assure the operation was successful. If not successful, then call
perror( "my error message" );
to notify the user that a problem occurred and what the system thinks is the cause of the problem. Then, most likely, clean up (close files, etc) then call:exit( EXIT_FAILURE );
The expression:
sizeof( char )
is defined in the C standard as 1. Multiplying anything by 1 has no effect. I suggest removing that expression.
-
2\$\begingroup\$
perror()
is not appropriate for amalloc()
failure, aserrno
is not set by that. Use a plainfprintf(stderr, "Failed memory allocation");
\$\endgroup\$Toby Speight– Toby Speight2019年04月10日 09:38:51 +00:00Commented Apr 10, 2019 at 9:38 -
\$\begingroup\$ @TobySpeight, This is an excerpt from the MAN page. Note that it does set
errno
. "ERRORS calloc(), malloc(), realloc(), and reallocarray() can fail with the following error: ENOMEM Out of memory." So, the function:perror()
is a valid function to call whenmalloc()
fails \$\endgroup\$user3629249– user36292492019年04月11日 08:43:18 +00:00Commented Apr 11, 2019 at 8:43 -
\$\begingroup\$ POSIX does indeed specify that
errno
is set. I don't have a C standard handy, but I'm pretty sure that it doesn't mentionerrno
withmalloc()
- meaning that non-POSIX platforms may or may not set it. I'm told that Windows and MacOS default libraries do seterrno
, so if you only care about those and POSIX, then you may file that recommendation under "pedantry"... \$\endgroup\$Toby Speight– Toby Speight2019年04月11日 12:54:27 +00:00Commented Apr 11, 2019 at 12:54