Generating 1000 random numbers

Question 1

I have got the following C code and I feel it is crazy stupid and there must be a better way to do this, but just can't think of one and have yet to learn algorithms.

/*generates 1000 random numbers, lists frequency*/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void)
{
 int count, generated_num ;
 char nums[10] = {0};
 srand(time(NULL));
 for (count = 0; count < 1000; count++) {
 generated_num = rand() % 10 + 1;
 nums[generated_num-1] += 1;
 }
 for (count = 0; count < 10; count++) {
 printf("%d occured %d times\n", count+1, nums[count]);
 }
 return 0;
}

How do I learn to write code that is both efficient in speed but also efficient in the sense it is readable/modifiable/not-a-mess?

Question 2

What is the purpose of your code? What are you trying to determine? Are you trying to measure the behavior of rand() or are you trying to roll "fairly" and keep count to see how much it varies from an even distribution?

Question 3

You should be aware that this doesn't actually create a uniform distribution, but there is a way to address that. Whether it's worthwhile doing so for your application is up to you, but you should be aware of the issue.

Question 4

Please remember to vote on any answers you've found helpful. You may also accept the single most helpful answer.

Question 5

If you care (even a little bit) about the quality of random numbers you generate you should use a better RNG library than rand()+srand. There is the modulo bias @Edward mentioned, but it's small compared to the effect of rand sucking.

Question 6

actually using char or short would most probably be slower on modern architectures, you can do a benchmark and see. You should use int instead and only use smaller types in very large arrays that don't fit in cache

Question 7

Speed and efficiency

Just don't worry about it. You need to generate 1000 random numbers, and do a little bit of accounting work. Any reasonable solution, such as yours, will perform similarly.

Types

Why use an array of char to keep track of the number of occurrences? You're putting 1000 random numbers into 10 bins. What if one of those bins gets more than 127 items? You would get an overflow.

In summary, those bins should be of type int. Don't even think about skimping on a few bytes of storage, if you want your code to be not-a-mess.

Naming

nums actually stores counts. count, as you've used it in the second loop, actually refers to the generated numbers. I suggest the following renaming:

nums → bins
count → i

Question 8

You should elaborated on the bias issue and how to address it.

Question 9

+1 for noting that count should be renamed to i - the non-canonical loop variable was the very first thing that stuck out to me when I read the code.

Question 10

@Edward I posted the answer in a rush. On further thought, a bias of 1 part in 2 billion is too insignificant to make a fuss over in most situations, so I have retracted that part of the answer.

Question 11

@200_success: Whether it's a problem or not depends on the use case.

Question 12

@Edward: If bias in this situation is a problem, you shouldn't be using rand() in the first place.

Question 13

Overall, this code doesn't look too bad, but there are a few places that can be improved.

This line:
```
char nums[10] = {0};
```
doesn't initialize all the elements of nums (I think some compilers will default-initialize local variables in certain build modes, but you shouldn't rely on that). Use a for loop or the standard library function memset() to do that:
```
#include <string.h>
...
memset(nums, 0, sizeof(nums));
```
As 200_success notes, using a char as the base type of the array leaves you prone to overflow if you happen to randomly generate more than 127 of a given number (assuming your char has 8 bits, which it probably does). For counting things, it's generally a good idea to use unsigned quantities if you know your count will never be negative, so nums should be declared as:
```
unsigned int nums[10];
```
In this code, you're adding 1 to generated_num, then subtracting 1 when you use it on the next line as an index:
```
generated_num = rand() % 10 + 1;
nums[generated_num-1] += 1;
```
Leave that out:
```
generated_num = rand() % 10;
nums[generated_num] += 1;
```
You could conceivably collapse the line into one, saving yourself the declaration of generated_num at the start of main:
```
nums[rand() % 10] += 1;
```
You could also use the pre- or post-increment operators to update the count for the generated number:
```
nums[rand() % 10]++;
```
You use the magic number 10 (the size of the nums array) in several places in the code. You should declare that in a constant, one of:
```
// Number of bins in the histogram of random numbers.
static const size_t NUM_BINS = 10;
// ... or ...
#define NUM_BINS 10
// ...
char nums[NUM_BINS]; // etc.
```
That way, if you decide you want 11 or 12 or 37 random numbers, you only have to change it in one place.

Question 14

Regarding your first point: C99 6.7.8 § 10 explicitly indicates that it would be initialised to all zeroes.

Question 15

And char nums[10] = {0}; is both more readable and (almost certainly) faster to execute than char nums[10]; for (int i = 0; i < sizeof(nums) / sizeof(*nums); ++i) { nums[i] = 0; }. If the loop version or the memset version came across my desk for a code review, I'd ask that it be changed to the initialization version.

Question 16

@Snowbody: They certainly could...but then, they could also redefine 0 or sizeof(char) if they felt like it. The question is how far you want to go to accommodate compilers that can't be bothered to comply with a well-known standard that's been around for a decade and a half. (Me, i'd say "not very far at all".)

Question 17

@Snowbody: To be clear, the C standard even before C99 specified the same thing. C89 (the first official C standard) said: "If there are fewer initializers in a list than there are members of an aggregate, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration." and the rule for implicit initialization of static objects is "it is initialized implicitly as if every member that has arithmetic type were assigned 0 and every member that has pointer type were assigned a null pointer constant."

Question 18

In C the {0} construct is known as the universal (zero) initializer because it can be applied to every type (though gcc seems to not like it by emitting an undeserved warning when using it on a compound type, but that's gcc's problem for being stupid). Please use it, that's what it's for.

Question 19

Others have already pointed out the some things about the coding style. Following their advice, you will be on par with more professional C code. Unfortunately, that will still not give you good random numbers.

Now, depending on what you want to do, that might not be too bad, but I still wanted to point it out. The C function rand is (are) a very very bad random number generator. Given only few (full) outputs, one can already deduce some information about the next numbers.

Also srand(time(NULL)); is a relatively bad seed. Running the program on two different computers but at the same time (in seconds!) will result in identical numbers. Running the program in short succession will result in similar numbers.

Furthermore, rand() % 10 will not create evenly distributed numbers. RAND_MAX is usually not divisible by 10, so the lower numbers will be more likely than 8 or 9. This is even more pronounced when you want to draw numbers from a larger range. RAND_MAX is often only of the order of 32k, so drawing numbers in a range of 1-2000 or similar will already be significantly skewed towards the lower numbers.

tl;dr If the random numbers are not important (e.g. a computer game), everything should be fine. If you ever want to do numerics or cryptography, then do not use rand().

edit: It is somewhat difficult to get good random numbers in C...

If you want to do cryptography then read /dev/random on linux systems (note that under OSX /dev/random is not cryptographically secure!) or call CryptGenRandom on windows systems. Both will deliver 'true' random numbers. For numerics you can (for example) use a Mersenne Twister (creative-commons code on the wikipedia) and seed it with true random numbers or with (preferably a hash of) the current time in nanoseconds.

In either case you still have to do some work to get a uniform distribution over any given range. A simple but ineffective method simply draws a new random number when needed

/// @arg (*random) - a function providing (pseudo) random unsigned numbers
/// @arg randMax - the largest number that can be returned by (*random)
/// @arg rangeMax - the largest desired number
/// @returns a uniformly random number in [0, rangeMax]
unsigned uniform(unsigned (*random)(void), unsigned randMax, unsigned rangeMax) {
 if (rangeMax <= randMax) {
 // the following assumes, that both maxima are smaller than 2^32-1
 randMax += 1;
 rangeMax += 1;
 unsigned cutoff = (randMax / rangeMax) * rangeMax;
 unsigned result = random();
 while (result >= cutoff) {
 result = random();
 }
 return result % rangeMax;
 } else {
 // more complicated code or error
 }
}

If you do not want to throw away any random numbers that you sampled you will have to manage some internal state. This quickly becomes complicated and is beyond the scope of this answer.

Question 20

It would be nice if you mentioned the alternative to rand.

Question 21

@ckuhn203: It may just be a general idea. There aren't too many options in C, unless one tries to implement something from scratch.

Question 22

@ckuhn203 I am not aware of any library that provides a simple solution for C but I added some pointers on how to obtain proper random numbers manually.

Question 23

This is actually pretty concise for a simple program, but I have two additional things:

Declare count right before the loop since it's a loop counter variable. You shouldn't just "list" every variable at the top, as it may make it harder to keep track of them if they're no longer used.

For loop counters in general, this is applicable to pre-C99. If you have a newer version, thereby allowing you to initialize the counter variable within the loop statement, then do that instead.
srand() can be moved to the very top of main() to help keep it separate, although this not really a huge deal. The important thing in general, though, is to only call it in main() in more modular programs (to avoid resetting the seed).

Question 24

I have always declared all variables at the beginning, as this is how I've seen ii done and always done it. What do you mean with srand()? Are you saying to make it global as long as it is only used in one function?

Question 25

@user50585: Oh, sorry; I meant the very top of main(). I'll fix that.

Question 26

But where would I put it if multiple functions used it? In main at the top, or as a global? What about spread across multiple files, only one file can have it, right.

Question 27

@user50585: This is probably a habit from the olden days, when you actually HAD to do that (not sure if applies to C++, but it does to C). Nowadays you can just do for (int i = 0; i < 1000; i++). It improves readability and ensures you don't use the iteration variable somewhere else because you forget that you set it in the loop by limiting its scope to the loop itself.

Question 28

@user50585: Yes, it's bad. It decreases readability by separating the variable names (and types) from the spots where they're actually used, which makes it harder for a reader to know, at the point where it's used, what type the variable is. It also makes it more work to remove a variable - you have to remove it both from the list it was declared in and from the place where it was used, so the "all variables declared at the start of a block" method tends to leave lots of unused variables lying around that no one bothered to remove.

200_success 200_success 146k22 gold badges190 silver badges479 bronze badges · Answer 1 · 2014-08-05 18:06:09Z

19

\$\begingroup\$

Speed and efficiency

Just don't worry about it. You need to generate 1000 random numbers, and do a little bit of accounting work. Any reasonable solution, such as yours, will perform similarly.

Types

Why use an array of char to keep track of the number of occurrences? You're putting 1000 random numbers into 10 bins. What if one of those bins gets more than 127 items? You would get an overflow.

In summary, those bins should be of type int. Don't even think about skimping on a few bytes of storage, if you want your code to be not-a-mess.

Naming

nums actually stores counts. count, as you've used it in the second loop, actually refers to the generated numbers. I suggest the following renaming:

nums → bins
count → i

Share

edited Aug 5, 2014 at 20:01

answered Aug 5, 2014 at 18:06

200_success's user avatar

200_success 200_success

146k22 gold badges190 silver badges479 bronze badges

\$\endgroup\$

5

5

\$\begingroup\$ You should elaborated on the bias issue and how to address it. \$\endgroup\$

Edward
– Edward

2014年08月05日 18:16:41 +00:00
Commented Aug 5, 2014 at 18:16
1

\$\begingroup\$ +1 for noting that count should be renamed to i - the non-canonical loop variable was the very first thing that stuck out to me when I read the code. \$\endgroup\$

godlygeek
– godlygeek

2014年08月05日 19:11:27 +00:00
Commented Aug 5, 2014 at 19:11
\$\begingroup\$ @Edward I posted the answer in a rush. On further thought, a bias of 1 part in 2 billion is too insignificant to make a fuss over in most situations, so I have retracted that part of the answer. \$\endgroup\$

200_success
– 200_success

2014年08月05日 20:04:06 +00:00
Commented Aug 5, 2014 at 20:04
\$\begingroup\$ @200_success: Whether it's a problem or not depends on the use case. \$\endgroup\$

Edward
– Edward

2014年08月05日 21:12:10 +00:00
Commented Aug 5, 2014 at 21:12
\$\begingroup\$ @Edward: If bias in this situation is a problem, you shouldn't be using rand() in the first place. \$\endgroup\$

Gabe
– Gabe

2014年08月06日 04:11:23 +00:00
Commented Aug 6, 2014 at 4:11

Add a comment |

Niall C. Niall C. 8591 gold badge7 silver badges19 bronze badges · Answer 2 · 2014-08-05 18:05:10Z

Overall, this code doesn't look too bad, but there are a few places that can be improved.

This line:
```
char nums[10] = {0};
```
doesn't initialize all the elements of nums (I think some compilers will default-initialize local variables in certain build modes, but you shouldn't rely on that). Use a for loop or the standard library function memset() to do that:
```
#include <string.h>
...
memset(nums, 0, sizeof(nums));
```
As 200_success notes, using a char as the base type of the array leaves you prone to overflow if you happen to randomly generate more than 127 of a given number (assuming your char has 8 bits, which it probably does). For counting things, it's generally a good idea to use unsigned quantities if you know your count will never be negative, so nums should be declared as:
```
unsigned int nums[10];
```
In this code, you're adding 1 to generated_num, then subtracting 1 when you use it on the next line as an index:
```
generated_num = rand() % 10 + 1;
nums[generated_num-1] += 1;
```
Leave that out:
```
generated_num = rand() % 10;
nums[generated_num] += 1;
```
You could conceivably collapse the line into one, saving yourself the declaration of generated_num at the start of main:
```
nums[rand() % 10] += 1;
```
You could also use the pre- or post-increment operators to update the count for the generated number:
```
nums[rand() % 10]++;
```
You use the magic number 10 (the size of the nums array) in several places in the code. You should declare that in a constant, one of:
```
// Number of bins in the histogram of random numbers.
static const size_t NUM_BINS = 10;
// ... or ...
#define NUM_BINS 10
// ...
char nums[NUM_BINS]; // etc.
```
That way, if you decide you want 11 or 12 or 37 random numbers, you only have to change it in one place.

Regarding your first point: C99 6.7.8 § 10 explicitly indicates that it would be initialised to all zeroes.
And char nums[10] = {0}; is both more readable and (almost certainly) faster to execute than char nums[10]; for (int i = 0; i < sizeof(nums) / sizeof(*nums); ++i) { nums[i] = 0; }. If the loop version or the memset version came across my desk for a code review, I'd ask that it be changed to the initialization version.
@Snowbody: They certainly could...but then, they could also redefine 0 or sizeof(char) if they felt like it. The question is how far you want to go to accommodate compilers that can't be bothered to comply with a well-known standard that's been around for a decade and a half. (Me, i'd say "not very far at all".)
@Snowbody: To be clear, the C standard even before C99 specified the same thing. C89 (the first official C standard) said: "If there are fewer initializers in a list than there are members of an aggregate, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration." and the rule for implicit initialization of static objects is "it is initialized implicitly as if every member that has arithmetic type were assigned 0 and every member that has pointer type were assigned a null pointer constant."
In C the {0} construct is known as the universal (zero) initializer because it can be applied to every type (though gcc seems to not like it by emitting an undeserved warning when using it on a compound type, but that's gcc's problem for being stupid). Please use it, that's what it's for.

example example 1713 bronze badges · Answer 3 · 2014-08-05 23:58:22Z