I created this small function just to practice C code. It's a simple random string generator.
#include <string.h>
#include <time.h>
char *randstring(int length) {
static int mySeed = 25011984;
char *string = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!";
size_t stringLen = strlen(string);
char *randomString = NULL;
srand(time(NULL) * length + ++mySeed);
if (length < 1) {
length = 1;
}
randomString = malloc(sizeof(char) * (length +1));
if (randomString) {
short key = 0;
for (int n = 0;n < length;n++) {
key = rand() % stringLen;
randomString[n] = string[key];
}
randomString[length] = '0円';
return randomString;
}
else {
printf("No memory");
exit(1);
}
}
The code seems to work ok. Any ideas, improvements or bugs?
I added the mySeed
var so that if I call it twice with the same length it doesn't give me the same exact string.
EDIT:
I have changed the code to this:
char *randstring(size_t length) {
static char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!";
char *randomString = NULL;
if (length) {
randomString = malloc(sizeof(char) * (length +1));
if (randomString) {
for (int n = 0;n < length;n++) {
int key = rand() % (int)(sizeof(charset) -1);
randomString[n] = charset[key];
}
randomString[length] = '0円';
}
}
return randomString;
}
I know that in the sizeof(charset)
you don't have to use the ()
. You only need them when using sizeof with types, but it's just out of habit.
5 Answers 5
Your function is nice but has a few issues, the main one being that it
should not call srand
. srand
should be called elsewhere (eg in main
)
just once. This seeds the random number generator, which need only be done
once.
A minor issue is that string
is badly named - charset
might be better.
It should be const
and you then need not call strlen
to find its length
sizeof charset -1
is enough. For me, randomString is an unnecessarily
long name.
On failing to allocate memory for the string, I would prefer to see a NULL
return than an exit
. If you want an error message, use perror
, but
perhaps in the caller, not here. I would be inclined to avoid the
possibility of such an error but passing in the buffer and its length
instead of allocating.
Some minor points: sizeof(char) is 1 by definition and using short
for
key
is pointless - just use int
. Also key
should be defined where it
is used and I would leave a space after the ; in the for loop definition.
Note also that using rand() % n
assumes that the modulo division
is random - that is not what rand
promises.
Here is how I might do it:
static char *rand_string(char *str, size_t size)
{
const char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJK...";
if (size) {
--size;
for (size_t n = 0; n < size; n++) {
int key = rand() % (int) (sizeof charset - 1);
str[n] = charset[key];
}
str[size] = '0円';
}
return str;
}
Edit July 31 23:07UTC
Why would I write the function to take a buffer instead of allocating the string inside the function?
Returning dynamically allocated strings works fine. And if the memory is later freed there is no problem. But writing this sort of function is a great way to leak memory, for example if the caller doesn't know the memory must be freed or forgets to free memory, or even if he frees it in the main path but forgets to free it in other paths etc.
Memory leaks in a desktop applications might not be fatal, but leaks in an embedded system will lead eventually to failure. This can be serious, depending upon the system involved. In many embedded systems, dynamic allocation is often not allowed or is at least best avoided.
Although it certainly is common not to know the size of strings or buffers at compile time, the opposite is also often true. It is often possible to write code with fixed buffer sizes. I always prefer this option if possible so I would be reluctant to use your allocating function. Perhaps it is better to add a wrapper to a non-allocating function for those cases where you really must allocate dynamically (for example when the random string has to outlive the calling context):
char* rand_string_alloc(size_t size)
{
char *s = malloc(size + 1);
if (s) {
rand_string(s, size);
}
return s;
}
-
\$\begingroup\$ note that you don't need to return any result because the str parameter is mutable \$\endgroup\$Quonux– Quonux2013年07月30日 20:09:48 +00:00Commented Jul 30, 2013 at 20:09
-
1\$\begingroup\$ That is true but many standard library functions do this too. It can make life easier, eg in
printf("%s\n", rand_string(buf, sizeof buf));
\$\endgroup\$William Morris– William Morris2013年07月30日 20:12:20 +00:00Commented Jul 30, 2013 at 20:12 -
\$\begingroup\$ your right, if you speak of easy life you grab c++,java,go,python,haskell,ruby or d unless your really need the last 100%-20% performance or your forced to use c \$\endgroup\$Quonux– Quonux2013年07月30日 20:17:14 +00:00Commented Jul 30, 2013 at 20:17
-
1\$\begingroup\$ The questioner is asking specifically about C, not C++, not Java, not go, python etc. Thanks for your input. \$\endgroup\$William Morris– William Morris2013年07月31日 00:50:33 +00:00Commented Jul 31, 2013 at 0:50
-
\$\begingroup\$ @WilliamMorris Could you provide some more insight on why you pass the buffer and not just create the buffer in the string with malloc and return it? Is it because this way it would be easier to free the memory? Is it not possible to free the returned pointer? Thanks. \$\endgroup\$AntonioCS– AntonioCS2013年07月31日 20:56:24 +00:00Commented Jul 31, 2013 at 20:56
Wanted to also give a small input. I had to implement a very similar function for a project of mine. I can't disclose the algorithm, however. But let me try to give you a couple small hints as to how you could further improve yours. I'm foremost concerned with stability and performance (low latency and high throughput) of code.
Let's start with this (profiled for a 1024-byte long string for 10,000 iterations with gprof):
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
67.16 0.02 0.02 10000 2.01 2.01 randstring
33.58 0.03 0.01 10000 1.01 1.01 mkrndstr
0.00 0.03 0.00 10000 1.01 1.01 mkrndstrperf
The difference is here really thin as on a few repetitions gprof gives random data, so let's take callgrind instead.
The total cost is then more precisely calculated in function calls:
7,986,402,268 ???:randstring [~perf/t/mkrndstr-notmine]
6,655,222,307 ???:mkrndstr [~perf/t/mkrndstr-notmine]
6,653,436,779 ???:mkrndstr_ipa [~perf/t/mkrndstr-notmine]
6,653,436,778 ???:mkrndstr_ipd [~perf/t/mkrndstr-notmine]
For a 10-byte long string and 10,000 calls, the relative difference is even bigger:
9,968,042 ???:randstring [~perf/t/mkrndstr-notmine]
8,646,775 ???:mkrndstr [~perf/t/mkrndstr-notmine]
6,716,774 ???:mkrndstr_ipa [~perf/t/mkrndstr-notmine]
6,716,774 ???:mkrndstr_ipd [~perf/t/mkrndstr-notmine]
Here, mkrndstr_ip{a,d}
means: in-place for automatically stored and dynamically stored data (different calls, identical functions), respectively.
Some key take-aways beforehand:
down-casting
size_t l = (sizeof(charset) -1); // uncast
versus
int l = (int)(sizeof(charset) -1); // cast to int as suggested in William Morris' reply
makes on that scale a big difference--you avoid passing around loads of superfluous bytes.
The static set of chars is a good idea, saves many cycles and calls, I'd prefix it with a const qualifier.
It's a bad idea to do per-cycle/per-iteration instantiations and identical calculations for the same reasons as in 2 above, make the value, loosely speaking, sticky as Quonux demonstrates it.
Considering randomness. I guess you know libc's
rand()
is an implementation of a PRNG. It's useless for anything serious. If your code gets executed too quickly, i.e., when there's too little interval between any two successive calls torand()
, you'll get the same character from the set. So make sure to pause for a couple CPU cycles. You could also simply read chunks from/dev/urandom
(with theu
prefix, otherwise my "simply" wouldn't hold) or similar on a UNIX derivative. Don't know how to access the random device on Windows.strlen
is indeed slower thansizeof
for clear reasons (expects complex strings), see the implementation in glibc sources, for example, instring/strlen.c
.The rest is in the comments.
Let's get to the code:
Yours, final:
char *randstring(size_t length) { // length should be qualified as const if you follow a rigorous standard static char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!"; char *randomString; // initializing to NULL isn't necessary as malloc() returns NULL if it couldn't allocate memory as requested if (length) { randomString = malloc(length +1); // I removed your `sizeof(char) * (length +1)` as sizeof(char) == 1, cf. C99 if (randomString) { for (int n = 0;n < length;n++) { int key = rand() % (int) (sizeof(charset) -1); randomString[n] = charset[key]; } randomString[length] = '0円'; } } return randomString; }
Mine (only slight optimizations):
char *mkrndstr(size_t length) { // const size_t length, supra static char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!"; // could be const char *randomString; if (length) { randomString = malloc(length +1); // sizeof(char) == 1, cf. C99 if (randomString) { int l = (int) (sizeof(charset) -1); // (static/global, could be const or #define SZ, would be even better) int key; // one-time instantiation (static/global would be even better) for (int n = 0;n < length;n++) { key = rand() % l; // no instantiation, just assignment, no overhead from sizeof randomString[n] = charset[key]; } randomString[length] = '0円'; } } return randomString; }
Now considering your question regarding dynamic versus automatic data storage:
void mkrndstr_ipa(size_t length, char *randomString) { // const size_t length, supra static char charset[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!"; if (length) { if (randomString) { int l = (int) (sizeof(charset) -1); for (int n = 0;n < length;n++) { int key = rand() % l; // per-iteration instantiation randomString[n] = charset[key]; } randomString[length] = '0円'; } } }
There's also an identical function with the modified name as stated way above. Both are called like this:
char *c = malloc(SZ_STR +1); // dynamic, in C on the heap char d[SZ_STR +1]; // "automatic," in C on the stack mkrndstr_ipd(SZ_STR, c); mkrndstr_ipa(SZ_STR, d);
Now the more interesting part:
.globl randstring
.type randstring, @function
randstring:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rbx
subq 40,ドル %rsp
.cfi_offset 3, -24
call mcount
movq %rdi, -40(%rbp)
cmpq 0,ドル -40(%rbp)
je .L2
movq -40(%rbp), %rax
addq 1,ドル %rax
movq %rax, %rdi
call malloc
movq %rax, -24(%rbp)
cmpq 0,ドル -24(%rbp)
je .L2
movl 0,ドル -28(%rbp)
jmp .L3
.L4:
call rand
movl %eax, %ecx
movl 1991868891,ドル %edx
movl %ecx, %eax
imull %edx
sarl 5,ドル %edx
movl %ecx, %eax
sarl 31,ドル %eax
movl %edx, %ebx
subl %eax, %ebx
movl %ebx, %eax
movl %eax, -32(%rbp)
movl -32(%rbp), %eax
imull 69,ドル %eax, %eax
movl %ecx, %edx
subl %eax, %edx
movl %edx, %eax
movl %eax, -32(%rbp)
movl -28(%rbp), %eax
movslq %eax, %rdx
movq -24(%rbp), %rax
addq %rax, %rdx
movl -32(%rbp), %eax
cltq
movzbl charset.1808(%rax), %eax
movb %al, (%rdx)
addl 1,ドル -28(%rbp)
.L3:
movl -28(%rbp), %eax
cltq
cmpq -40(%rbp), %rax
jb .L4
movq -40(%rbp), %rax
movq -24(%rbp), %rdx
addq %rdx, %rax
movb 0,ドル (%rax)
.L2:
movq -24(%rbp), %rax
addq 40,ドル %rsp
popq %rbx
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size randstring, .-randstring
compared with:
.globl mkrndstr
.type mkrndstr, @function
mkrndstr:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq 48,ドル %rsp
call mcount
movq %rdi, -40(%rbp)
cmpq 0,ドル -40(%rbp)
je .L7
movq -40(%rbp), %rax
addq 1,ドル %rax
movq %rax, %rdi
call malloc
movq %rax, -8(%rbp)
cmpq 0,ドル -8(%rbp)
je .L7
movl 69,ドル -16(%rbp)
movl 0,ドル -12(%rbp)
jmp .L8
.L9:
call rand
movl %eax, %edx
sarl 31,ドル %edx
idivl -16(%rbp)
movl %edx, -20(%rbp)
movl -12(%rbp), %eax
movslq %eax, %rdx
movq -8(%rbp), %rax
addq %rax, %rdx
movl -20(%rbp), %eax
cltq
movzbl charset.1818(%rax), %eax
movb %al, (%rdx)
addl 1,ドル -12(%rbp)
.L8:
movl -12(%rbp), %eax
cltq
cmpq -40(%rbp), %rax
jb .L9
movq -40(%rbp), %rax
movq -8(%rbp), %rdx
addq %rdx, %rax
movb 0,ドル (%rax)
.L7:
movq -8(%rbp), %rax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size mkrndstr, .-mkrndstr
I believe an interpretation is not necessary here, the disassembly is then only for visualisation. Compare the different number of instructions.
What I wanted to show is that just a couple of tiny nip'n'tucks does wonders. We sure could go memmove the string into an external buffer at a fixed address or do n-byte-wise assignments. We could also put the set in a macro or use even more static allocation. But let's not exaggerate and go write this outright in ASM, and when we'd be done with it, do the same in pure machine code.
I hope this helps and if it's not a real-time system, then don't do much hassle with the optimizations, stability and reliability (like rand()
) should go first. If you optimize this and that out, something else somewhere else could break. There are a few open-source projects online that show how to optimize excessively but at the cost of an unmaintanable code for newcomers to their teams and sheer complexity for the more accustomed developers.
Last thing I'd like to point you at is that
error: ‘for’ loop initial declarations are only allowed in C99 mode
-
1\$\begingroup\$ I would not expect either putting the size into a variable or defining the
key
variable outside the loop to make any difference. The compiler can optimise this away. Withgcc
using-O2
the two functions (randstring, mkrndstr) give identical assembler. The same is true ofclang
. How are you compiling (which compiler/options)? \$\endgroup\$William Morris– William Morris2013年08月03日 01:38:31 +00:00Commented Aug 3, 2013 at 1:38 -
\$\begingroup\$ Wow. Thanks for such an in-depth analysis of my code. \$\endgroup\$AntonioCS– AntonioCS2013年08月03日 16:28:02 +00:00Commented Aug 3, 2013 at 16:28
-
\$\begingroup\$ @William: I generally test performance without any compiler optimizations.
-O2
, in my opinion, does not always produce stable binaries, it's more about compatibility of your code and the optimizations specified to the compiler. That was an assembly bygcc
. Besides, not always do I run my code on x86 machines, so it may differ again. When I do fine-tuning, I manually pick optimizations instead of the predefined bundles. I prefer to get my code as I want it instead of trusting some compiler that it would improve it. By the way, this also applies to interpreted languages like Perl and Python. \$\endgroup\$Nicholas– Nicholas2013年08月04日 12:59:53 +00:00Commented Aug 4, 2013 at 12:59 -
2\$\begingroup\$ The thing is, code that looks shorter in assembly is not necessarily quicker. Your optimisation of moving the size of the character set to a variable causes the compiler (gcc on IA32) to emit an
idiv
instruction instead of using its most efficient method of dividing bysizeof charset - 1
(69). Testing just this division with gcc on OS-X and Linux, your code, which usesidiv
, is shorter but takes twice as long. With -O2 and with or without a separate variable for the size, the code is shorter still and takes a quarter of the time of the unoptimisedidiv
code. \$\endgroup\$William Morris– William Morris2013年08月04日 19:31:37 +00:00Commented Aug 4, 2013 at 19:31 -
2\$\begingroup\$ @AntinioCS, note that worrying about performance of any particular function is best avoided unless you find you have performance problems. In that case you should find the performance bottlenecks and optimise just those. \$\endgroup\$William Morris– William Morris2013年08月04日 19:39:14 +00:00Commented Aug 4, 2013 at 19:39
the code doesn't look that wrong, but
- the caller should allways check for errors
if (length < 1)
test unnecessarysrand(time(NULL)
should be done outside- error catching (
malloc
) should open no new block short
isn't really at the high of the time, use it only for serilizing/deserilizing raw binary files/network data- use
unsigned
if it makes sense, the counter can't be negative and this is not lala-java-land (besides it can be a bit faster but most times it doesn't matter) (side note, yes i am damaged from looking at the assembler output of the compilers) you don't need to recalculate the
string
length everytime, with C++0x you can maybe let it calculate from the compiler on compiletime#include <string.h> #include <time.h> char *randstring(int length) { char *string = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.-#'?!"; size_t stringLen = 26*2+10+7; char *randomString; randomString = malloc(sizeof(char) * (length +1)); if (!randomString) { return (char*)0; } unsigned int key = 0; for (int n = 0;n < length;n++) { key = rand() % stringLen; randomString[n] = string[key]; } randomString[length] = '0円'; return randomString; }
-
\$\begingroup\$ The
unsigned
(size_t
to be more specific) applies to thesize
parameter as well. \$\endgroup\$Jamal– Jamal2013年07月30日 20:01:49 +00:00Commented Jul 30, 2013 at 20:01
There is one problem with this code: it doesn't generate a random string each time the program is run. The code is missing srand(time(0));
. The srand()
function sets the starting point for producing a series of pseudo-random integers. If srand()
is not called, the rand()
seed is set as if srand(1)
were called at program start. Any other value for seed sets the generator to a different starting point.
For more details, read rand() and srand() in C/C++
-
\$\begingroup\$ As William Morris points out (first paragraph), it does call
srand()
- unfortunately it does so on each call of the function, rather than once in themain()
. \$\endgroup\$Toby Speight– Toby Speight2018年02月09日 14:15:07 +00:00Commented Feb 9, 2018 at 14:15 -
\$\begingroup\$ Yeah William pointed that out, and it doesn't need to call
srand()
every function call once inmain()
is fine. \$\endgroup\$YasinYA– YasinYA2018年02月09日 15:31:44 +00:00Commented Feb 9, 2018 at 15:31
I have a suggestion. It's ok to use a charset but I would recommend using ASCII values to save some memory (not that it matters too much). You can check out ASCII values here, for example, https://www.asciitable.com/ and then you set the interval of numbers you want. In the example below I will use 32-126. Also, as it was already said, you should call srand
on main
or something, since it should be called only once.
What I mean is something like:
#define ASCII_START 32
#define ASCII_END 126
char* generateRandomString(int size) {
int i;
char *res = malloc(size + 1);
for(i = 0; i < size; i++) {
res[i] = (char) (rand()%(ASCII_END-ASCII_START))+ASCII_START;
}
res[i] = '0円';
return res;
}
//srand is not visible, as I said it could be on main.