Random String generator in C

Question 1

I created this small function just to practice C code. It's a simple random string generator.

#include <string.h>
#include <time.h>
char *randstring(int length) { 
 static int mySeed = 25011984;
 char *string = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!";
 size_t stringLen = strlen(string); 
 char *randomString = NULL;
 srand(time(NULL) * length + ++mySeed);
 if (length < 1) {
 length = 1;
 }
 randomString = malloc(sizeof(char) * (length +1));
 if (randomString) {
 short key = 0;
 for (int n = 0;n < length;n++) { 
 key = rand() % stringLen; 
 randomString[n] = string[key];
 }
 randomString[length] = '0円';
 return randomString; 
 }
 else {
 printf("No memory");
 exit(1);
 }
}

The code seems to work ok. Any ideas, improvements or bugs?

I added the mySeed var so that if I call it twice with the same length it doesn't give me the same exact string.

EDIT:

I have changed the code to this:

char *randstring(size_t length) {
 static char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!"; 
 char *randomString = NULL;
 if (length) {
 randomString = malloc(sizeof(char) * (length +1));
 if (randomString) { 
 for (int n = 0;n < length;n++) { 
 int key = rand() % (int)(sizeof(charset) -1);
 randomString[n] = charset[key];
 }
 randomString[length] = '0円';
 }
 }
 return randomString;
}

I know that in the sizeof(charset) you don't have to use the (). You only need them when using sizeof with types, but it's just out of habit.

Question 2

Your function is nice but has a few issues, the main one being that it should not call srand. srand should be called elsewhere (eg in main) just once. This seeds the random number generator, which need only be done once.

A minor issue is that string is badly named - charset might be better. It should be const and you then need not call strlen to find its length sizeof charset -1 is enough. For me, randomString is an unnecessarily long name.

On failing to allocate memory for the string, I would prefer to see a NULL return than an exit. If you want an error message, use perror, but perhaps in the caller, not here. I would be inclined to avoid the possibility of such an error but passing in the buffer and its length instead of allocating.

Some minor points: sizeof(char) is 1 by definition and using short for key is pointless - just use int. Also key should be defined where it is used and I would leave a space after the ; in the for loop definition.

Note also that using rand() % n assumes that the modulo division is random - that is not what rand promises.

Here is how I might do it:

static char *rand_string(char *str, size_t size)
{
 const char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJK...";
 if (size) {
 --size;
 for (size_t n = 0; n < size; n++) {
 int key = rand() % (int) (sizeof charset - 1);
 str[n] = charset[key];
 }
 str[size] = '0円';
 }
 return str;
}

Edit July 31 23:07UTC

Why would I write the function to take a buffer instead of allocating the string inside the function?

Returning dynamically allocated strings works fine. And if the memory is later freed there is no problem. But writing this sort of function is a great way to leak memory, for example if the caller doesn't know the memory must be freed or forgets to free memory, or even if he frees it in the main path but forgets to free it in other paths etc.

Memory leaks in a desktop applications might not be fatal, but leaks in an embedded system will lead eventually to failure. This can be serious, depending upon the system involved. In many embedded systems, dynamic allocation is often not allowed or is at least best avoided.

Although it certainly is common not to know the size of strings or buffers at compile time, the opposite is also often true. It is often possible to write code with fixed buffer sizes. I always prefer this option if possible so I would be reluctant to use your allocating function. Perhaps it is better to add a wrapper to a non-allocating function for those cases where you really must allocate dynamically (for example when the random string has to outlive the calling context):

char* rand_string_alloc(size_t size)
{
 char *s = malloc(size + 1);
 if (s) {
 rand_string(s, size);
 }
 return s;
}

Question 3

note that you don't need to return any result because the str parameter is mutable

Question 4

That is true but many standard library functions do this too. It can make life easier, eg in printf("%s\n", rand_string(buf, sizeof buf));

Question 5

your right, if you speak of easy life you grab c++,java,go,python,haskell,ruby or d unless your really need the last 100%-20% performance or your forced to use c

Question 6

The questioner is asking specifically about C, not C++, not Java, not go, python etc. Thanks for your input.

Question 7

@WilliamMorris Could you provide some more insight on why you pass the buffer and not just create the buffer in the string with malloc and return it? Is it because this way it would be easier to free the memory? Is it not possible to free the returned pointer? Thanks.

Question 8

Wanted to also give a small input. I had to implement a very similar function for a project of mine. I can't disclose the algorithm, however. But let me try to give you a couple small hints as to how you could further improve yours. I'm foremost concerned with stability and performance (low latency and high throughput) of code.

Let's start with this (profiled for a 1024-byte long string for 10,000 iterations with gprof):

Each sample counts as 0.01 seconds.
 % cumulative self self total 
 time seconds seconds calls us/call us/call name 
 67.16 0.02 0.02 10000 2.01 2.01 randstring
 33.58 0.03 0.01 10000 1.01 1.01 mkrndstr
 0.00 0.03 0.00 10000 1.01 1.01 mkrndstrperf

The difference is here really thin as on a few repetitions gprof gives random data, so let's take callgrind instead.

The total cost is then more precisely calculated in function calls:

 7,986,402,268 ???:randstring [~perf/t/mkrndstr-notmine]
 6,655,222,307 ???:mkrndstr [~perf/t/mkrndstr-notmine]
 6,653,436,779 ???:mkrndstr_ipa [~perf/t/mkrndstr-notmine]
 6,653,436,778 ???:mkrndstr_ipd [~perf/t/mkrndstr-notmine]

For a 10-byte long string and 10,000 calls, the relative difference is even bigger:

 9,968,042 ???:randstring [~perf/t/mkrndstr-notmine]
 8,646,775 ???:mkrndstr [~perf/t/mkrndstr-notmine]
 6,716,774 ???:mkrndstr_ipa [~perf/t/mkrndstr-notmine]
 6,716,774 ???:mkrndstr_ipd [~perf/t/mkrndstr-notmine]

Here, mkrndstr_ip{a,d} means: in-place for automatically stored and dynamically stored data (different calls, identical functions), respectively.

Some key take-aways beforehand:

down-casting

size_t l = (sizeof(charset) -1); // uncast

versus

int l = (int)(sizeof(charset) -1); // cast to int as suggested in William Morris' reply

makes on that scale a big difference--you avoid passing around loads of superfluous bytes.

The static set of chars is a good idea, saves many cycles and calls, I'd prefix it with a const qualifier.
It's a bad idea to do per-cycle/per-iteration instantiations and identical calculations for the same reasons as in 2 above, make the value, loosely speaking, sticky as Quonux demonstrates it.
Considering randomness. I guess you know libc's rand() is an implementation of a PRNG. It's useless for anything serious. If your code gets executed too quickly, i.e., when there's too little interval between any two successive calls to rand(), you'll get the same character from the set. So make sure to pause for a couple CPU cycles. You could also simply read chunks from /dev/urandom (with the u prefix, otherwise my "simply" wouldn't hold) or similar on a UNIX derivative. Don't know how to access the random device on Windows.
strlen is indeed slower than sizeof for clear reasons (expects complex strings), see the implementation in glibc sources, for example, in string/strlen.c.
The rest is in the comments.

Let's get to the code:

Yours, final:

char *randstring(size_t length) { // length should be qualified as const if you follow a rigorous standard
static char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!"; 
char *randomString; // initializing to NULL isn't necessary as malloc() returns NULL if it couldn't allocate memory as requested
if (length) {
 randomString = malloc(length +1); // I removed your `sizeof(char) * (length +1)` as sizeof(char) == 1, cf. C99
 if (randomString) { 
 for (int n = 0;n < length;n++) { 
 int key = rand() % (int) (sizeof(charset) -1);
 randomString[n] = charset[key];
 }
 randomString[length] = '0円';
 }
}
return randomString;
}

Mine (only slight optimizations):

char *mkrndstr(size_t length) { // const size_t length, supra
static char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!"; // could be const
char *randomString;
if (length) {
 randomString = malloc(length +1); // sizeof(char) == 1, cf. C99
 if (randomString) {
 int l = (int) (sizeof(charset) -1); // (static/global, could be const or #define SZ, would be even better)
 int key; // one-time instantiation (static/global would be even better)
 for (int n = 0;n < length;n++) { 
 key = rand() % l; // no instantiation, just assignment, no overhead from sizeof
 randomString[n] = charset[key];
 }
 randomString[length] = '0円';
 }
}
return randomString;
}

Now considering your question regarding dynamic versus automatic data storage:

void mkrndstr_ipa(size_t length, char *randomString) { // const size_t length, supra
static char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!";
if (length) {
 if (randomString) {
 int l = (int) (sizeof(charset) -1);
 for (int n = 0;n < length;n++) {
 int key = rand() % l; // per-iteration instantiation
 randomString[n] = charset[key];
 }
 randomString[length] = '0円';
 }
}
}

There's also an identical function with the modified name as stated way above. Both are called like this:

char *c = malloc(SZ_STR +1); // dynamic, in C on the heap
char d[SZ_STR +1]; // "automatic," in C on the stack
mkrndstr_ipd(SZ_STR, c);
mkrndstr_ipa(SZ_STR, d);

Now the more interesting part:

 .globl randstring
 .type randstring, @function
randstring:
.LFB0:
 .cfi_startproc
 pushq %rbp
 .cfi_def_cfa_offset 16
 .cfi_offset 6, -16
 movq %rsp, %rbp
 .cfi_def_cfa_register 6
 pushq %rbx
 subq 40,ドル %rsp
 .cfi_offset 3, -24
 call mcount
 movq %rdi, -40(%rbp)
 cmpq 0,ドル -40(%rbp)
 je .L2
 movq -40(%rbp), %rax
 addq 1,ドル %rax
 movq %rax, %rdi
 call malloc
 movq %rax, -24(%rbp)
 cmpq 0,ドル -24(%rbp)
 je .L2
 movl 0,ドル -28(%rbp)
 jmp .L3
.L4:
 call rand
 movl %eax, %ecx
 movl 1991868891,ドル %edx
 movl %ecx, %eax
 imull %edx
 sarl 5,ドル %edx
 movl %ecx, %eax
 sarl 31,ドル %eax
 movl %edx, %ebx
 subl %eax, %ebx
 movl %ebx, %eax
 movl %eax, -32(%rbp)
 movl -32(%rbp), %eax
 imull 69,ドル %eax, %eax
 movl %ecx, %edx
 subl %eax, %edx
 movl %edx, %eax
 movl %eax, -32(%rbp)
 movl -28(%rbp), %eax
 movslq %eax, %rdx
 movq -24(%rbp), %rax
 addq %rax, %rdx
 movl -32(%rbp), %eax
 cltq
 movzbl charset.1808(%rax), %eax
 movb %al, (%rdx)
 addl 1,ドル -28(%rbp)
.L3:
 movl -28(%rbp), %eax
 cltq
 cmpq -40(%rbp), %rax
 jb .L4
 movq -40(%rbp), %rax
 movq -24(%rbp), %rdx
 addq %rdx, %rax
 movb 0,ドル (%rax)
.L2:
 movq -24(%rbp), %rax
 addq 40,ドル %rsp
 popq %rbx
 popq %rbp
 .cfi_def_cfa 7, 8
 ret
 .cfi_endproc
.LFE0:
 .size randstring, .-randstring

compared with:

 .globl mkrndstr
 .type mkrndstr, @function
mkrndstr:
.LFB1:
 .cfi_startproc
 pushq %rbp
 .cfi_def_cfa_offset 16
 .cfi_offset 6, -16
 movq %rsp, %rbp
 .cfi_def_cfa_register 6
 subq 48,ドル %rsp
 call mcount
 movq %rdi, -40(%rbp)
 cmpq 0,ドル -40(%rbp)
 je .L7
 movq -40(%rbp), %rax
 addq 1,ドル %rax
 movq %rax, %rdi
 call malloc
 movq %rax, -8(%rbp)
 cmpq 0,ドル -8(%rbp)
 je .L7
 movl 69,ドル -16(%rbp)
 movl 0,ドル -12(%rbp)
 jmp .L8
.L9:
 call rand
 movl %eax, %edx
 sarl 31,ドル %edx
 idivl -16(%rbp)
 movl %edx, -20(%rbp)
 movl -12(%rbp), %eax
 movslq %eax, %rdx
 movq -8(%rbp), %rax
 addq %rax, %rdx
 movl -20(%rbp), %eax
 cltq
 movzbl charset.1818(%rax), %eax
 movb %al, (%rdx)
 addl 1,ドル -12(%rbp)
.L8:
 movl -12(%rbp), %eax
 cltq
 cmpq -40(%rbp), %rax
 jb .L9
 movq -40(%rbp), %rax
 movq -8(%rbp), %rdx
 addq %rdx, %rax
 movb 0,ドル (%rax)
.L7:
 movq -8(%rbp), %rax
 leave
 .cfi_def_cfa 7, 8
 ret
 .cfi_endproc
.LFE1:
 .size mkrndstr, .-mkrndstr

I believe an interpretation is not necessary here, the disassembly is then only for visualisation. Compare the different number of instructions.

What I wanted to show is that just a couple of tiny nip'n'tucks does wonders. We sure could go memmove the string into an external buffer at a fixed address or do n-byte-wise assignments. We could also put the set in a macro or use even more static allocation. But let's not exaggerate and go write this outright in ASM, and when we'd be done with it, do the same in pure machine code.

I hope this helps and if it's not a real-time system, then don't do much hassle with the optimizations, stability and reliability (like rand()) should go first. If you optimize this and that out, something else somewhere else could break. There are a few open-source projects online that show how to optimize excessively but at the cost of an unmaintanable code for newcomers to their teams and sheer complexity for the more accustomed developers.

Last thing I'd like to point you at is that

error: ‘for’ loop initial declarations are only allowed in C99 mode

Question 9

I would not expect either putting the size into a variable or defining the key variable outside the loop to make any difference. The compiler can optimise this away. With gcc using -O2 the two functions (randstring, mkrndstr) give identical assembler. The same is true of clang. How are you compiling (which compiler/options)?

Question 10

Wow. Thanks for such an in-depth analysis of my code.

Question 11

@William: I generally test performance without any compiler optimizations. -O2, in my opinion, does not always produce stable binaries, it's more about compatibility of your code and the optimizations specified to the compiler. That was an assembly by gcc. Besides, not always do I run my code on x86 machines, so it may differ again. When I do fine-tuning, I manually pick optimizations instead of the predefined bundles. I prefer to get my code as I want it instead of trusting some compiler that it would improve it. By the way, this also applies to interpreted languages like Perl and Python.

Question 12

The thing is, code that looks shorter in assembly is not necessarily quicker. Your optimisation of moving the size of the character set to a variable causes the compiler (gcc on IA32) to emit an idiv instruction instead of using its most efficient method of dividing by sizeof charset - 1 (69). Testing just this division with gcc on OS-X and Linux, your code, which uses idiv, is shorter but takes twice as long. With -O2 and with or without a separate variable for the size, the code is shorter still and takes a quarter of the time of the unoptimised idiv code.

Question 13

@AntinioCS, note that worrying about performance of any particular function is best avoided unless you find you have performance problems. In that case you should find the performance bottlenecks and optimise just those.

Question 14

the code doesn't look that wrong, but

the caller should allways check for errors
if (length < 1) test unnecessary
srand(time(NULL) should be done outside
error catching (malloc) should open no new block
short isn't really at the high of the time, use it only for serilizing/deserilizing raw binary files/network data
use unsigned if it makes sense, the counter can't be negative and this is not lala-java-land (besides it can be a bit faster but most times it doesn't matter) (side note, yes i am damaged from looking at the assembler output of the compilers)

you don't need to recalculate the string length everytime, with C++0x you can maybe let it calculate from the compiler on compiletime

#include <string.h>
#include <time.h>
char *randstring(int length) { 
 char *string = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!";
 size_t stringLen = 26*2+10+7; 
 char *randomString;
 randomString = malloc(sizeof(char) * (length +1));
 if (!randomString) {
 return (char*)0;
 }
 unsigned int key = 0;
 for (int n = 0;n < length;n++) { 
 key = rand() % stringLen; 
 randomString[n] = string[key];
 }
 randomString[length] = '0円';
 return randomString;
}

Question 15

The unsigned (size_t to be more specific) applies to the size parameter as well.

Question 16

There is one problem with this code: it doesn't generate a random string each time the program is run. The code is missing srand(time(0));. The srand() function sets the starting point for producing a series of pseudo-random integers. If srand() is not called, the rand() seed is set as if srand(1) were called at program start. Any other value for seed sets the generator to a different starting point.

For more details, read rand() and srand() in C/C++

Question 17

As William Morris points out (first paragraph), it does call srand() - unfortunately it does so on each call of the function, rather than once in the main().

Question 18

Yeah William pointed that out, and it doesn't need to call srand() every function call once in main() is fine.

Question 19

I have a suggestion. It's ok to use a charset but I would recommend using ASCII values to save some memory (not that it matters too much). You can check out ASCII values here, for example, https://www.asciitable.com/ and then you set the interval of numbers you want. In the example below I will use 32-126. Also, as it was already said, you should call srand on main or something, since it should be called only once.

What I mean is something like:

#define ASCII_START 32
#define ASCII_END 126
char* generateRandomString(int size) {
 int i;
 char *res = malloc(size + 1);
 for(i = 0; i < size; i++) {
 res[i] = (char) (rand()%(ASCII_END-ASCII_START))+ASCII_START;
 }
 res[i] = '0円';
 return res;
}
//srand is not visible, as I said it could be on main.

William Morris William Morris 9,39419 silver badges43 bronze badges · Accepted Answer · 2013-07-30 19:56:37Z

Your function is nice but has a few issues, the main one being that it should not call srand. srand should be called elsewhere (eg in main) just once. This seeds the random number generator, which need only be done once.

A minor issue is that string is badly named - charset might be better. It should be const and you then need not call strlen to find its length sizeof charset -1 is enough. For me, randomString is an unnecessarily long name.

On failing to allocate memory for the string, I would prefer to see a NULL return than an exit. If you want an error message, use perror, but perhaps in the caller, not here. I would be inclined to avoid the possibility of such an error but passing in the buffer and its length instead of allocating.

Some minor points: sizeof(char) is 1 by definition and using short for key is pointless - just use int. Also key should be defined where it is used and I would leave a space after the ; in the for loop definition.

Note also that using rand() % n assumes that the modulo division is random - that is not what rand promises.

Here is how I might do it:

static char *rand_string(char *str, size_t size)
{
 const char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJK...";
 if (size) {
 --size;
 for (size_t n = 0; n < size; n++) {
 int key = rand() % (int) (sizeof charset - 1);
 str[n] = charset[key];
 }
 str[size] = '0円';
 }
 return str;
}

Edit July 31 23:07UTC

Why would I write the function to take a buffer instead of allocating the string inside the function?

Returning dynamically allocated strings works fine. And if the memory is later freed there is no problem. But writing this sort of function is a great way to leak memory, for example if the caller doesn't know the memory must be freed or forgets to free memory, or even if he frees it in the main path but forgets to free it in other paths etc.

Memory leaks in a desktop applications might not be fatal, but leaks in an embedded system will lead eventually to failure. This can be serious, depending upon the system involved. In many embedded systems, dynamic allocation is often not allowed or is at least best avoided.

Although it certainly is common not to know the size of strings or buffers at compile time, the opposite is also often true. It is often possible to write code with fixed buffer sizes. I always prefer this option if possible so I would be reluctant to use your allocating function. Perhaps it is better to add a wrapper to a non-allocating function for those cases where you really must allocate dynamically (for example when the random string has to outlive the calling context):

char* rand_string_alloc(size_t size)
{
 char *s = malloc(size + 1);
 if (s) {
 rand_string(s, size);
 }
 return s;
}

note that you don't need to return any result because the str parameter is mutable
That is true but many standard library functions do this too. It can make life easier, eg in printf("%s\n", rand_string(buf, sizeof buf));
your right, if you speak of easy life you grab c++,java,go,python,haskell,ruby or d unless your really need the last 100%-20% performance or your forced to use c
The questioner is asking specifically about C, not C++, not Java, not go, python etc. Thanks for your input.
@WilliamMorris Could you provide some more insight on why you pass the buffer and not just create the buffer in the string with malloc and return it? Is it because this way it would be easier to free the memory? Is it not possible to free the returned pointer? Thanks.

Stack Exchange Network

Random String generator in C

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Random String generator in C

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions