I have got the following C code and I feel it is crazy stupid and there must be a better way to do this, but just can't think of one and have yet to learn algorithms.
/*generates 1000 random numbers, lists frequency*/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void)
{
int count, generated_num ;
char nums[10] = {0};
srand(time(NULL));
for (count = 0; count < 1000; count++) {
generated_num = rand() % 10 + 1;
nums[generated_num-1] += 1;
}
for (count = 0; count < 10; count++) {
printf("%d occured %d times\n", count+1, nums[count]);
}
return 0;
}
How do I learn to write code that is both efficient in speed but also efficient in the sense it is readable/modifiable/not-a-mess?
4 Answers 4
Speed and efficiency
Just don't worry about it. You need to generate 1000 random numbers, and do a little bit of accounting work. Any reasonable solution, such as yours, will perform similarly.
Types
Why use an array of char
to keep track of the number of occurrences? You're putting 1000 random numbers into 10 bins. What if one of those bins gets more than 127 items? You would get an overflow.
In summary, those bins should be of type int
. Don't even think about skimping on a few bytes of storage, if you want your code to be not-a-mess.
Naming
nums
actually stores counts. count
, as you've used it in the second loop, actually refers to the generated numbers. I suggest the following renaming:
nums
→bins
count
→i
-
5\$\begingroup\$ You should elaborated on the bias issue and how to address it. \$\endgroup\$Edward– Edward2014年08月05日 18:16:41 +00:00Commented Aug 5, 2014 at 18:16
-
1\$\begingroup\$ +1 for noting that
count
should be renamed toi
- the non-canonical loop variable was the very first thing that stuck out to me when I read the code. \$\endgroup\$godlygeek– godlygeek2014年08月05日 19:11:27 +00:00Commented Aug 5, 2014 at 19:11 -
\$\begingroup\$ @Edward I posted the answer in a rush. On further thought, a bias of 1 part in 2 billion is too insignificant to make a fuss over in most situations, so I have retracted that part of the answer. \$\endgroup\$200_success– 200_success2014年08月05日 20:04:06 +00:00Commented Aug 5, 2014 at 20:04
-
\$\begingroup\$ @200_success: Whether it's a problem or not depends on the use case. \$\endgroup\$Edward– Edward2014年08月05日 21:12:10 +00:00Commented Aug 5, 2014 at 21:12
-
\$\begingroup\$ @Edward: If bias in this situation is a problem, you shouldn't be using
rand()
in the first place. \$\endgroup\$Gabe– Gabe2014年08月06日 04:11:23 +00:00Commented Aug 6, 2014 at 4:11
Overall, this code doesn't look too bad, but there are a few places that can be improved.
This line:
char nums[10] = {0};
doesn't initialize all the elements of
nums
(I think some compilers will default-initialize local variables in certain build modes, but you shouldn't rely on that). Use afor
loop or the standard library functionmemset()
to do that:#include <string.h> ... memset(nums, 0, sizeof(nums));
As 200_success notes, using a
char
as the base type of the array leaves you prone to overflow if you happen to randomly generate more than 127 of a given number (assuming yourchar
has 8 bits, which it probably does). For counting things, it's generally a good idea to useunsigned
quantities if you know your count will never be negative, sonums
should be declared as:unsigned int nums[10];
In this code, you're adding 1 to
generated_num
, then subtracting 1 when you use it on the next line as an index:generated_num = rand() % 10 + 1; nums[generated_num-1] += 1;
Leave that out:
generated_num = rand() % 10; nums[generated_num] += 1;
You could conceivably collapse the line into one, saving yourself the declaration of
generated_num
at the start of main:nums[rand() % 10] += 1;
You could also use the pre- or post-increment operators to update the count for the generated number:
nums[rand() % 10]++;
You use the magic number
10
(the size of thenums
array) in several places in the code. You should declare that in a constant, one of:// Number of bins in the histogram of random numbers. static const size_t NUM_BINS = 10; // ... or ... #define NUM_BINS 10 // ... char nums[NUM_BINS]; // etc.
That way, if you decide you want 11 or 12 or 37 random numbers, you only have to change it in one place.
-
7\$\begingroup\$ Regarding your first point: C99 6.7.8 § 10 explicitly indicates that it would be initialised to all zeroes. \$\endgroup\$Schism– Schism2014年08月05日 18:39:54 +00:00Commented Aug 5, 2014 at 18:39
-
1\$\begingroup\$ And
char nums[10] = {0};
is both more readable and (almost certainly) faster to execute thanchar nums[10]; for (int i = 0; i < sizeof(nums) / sizeof(*nums); ++i) { nums[i] = 0; }
. If the loop version or the memset version came across my desk for a code review, I'd ask that it be changed to the initialization version. \$\endgroup\$godlygeek– godlygeek2014年08月05日 19:06:59 +00:00Commented Aug 5, 2014 at 19:06 -
4\$\begingroup\$ @Snowbody: They certainly could...but then, they could also redefine
0
orsizeof(char)
if they felt like it. The question is how far you want to go to accommodate compilers that can't be bothered to comply with a well-known standard that's been around for a decade and a half. (Me, i'd say "not very far at all".) \$\endgroup\$cHao– cHao2014年08月05日 19:52:52 +00:00Commented Aug 5, 2014 at 19:52 -
2\$\begingroup\$ @Snowbody: To be clear, the C standard even before C99 specified the same thing. C89 (the first official C standard) said: "If there are fewer initializers in a list than there are members of an aggregate, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration." and the rule for implicit initialization of static objects is "it is initialized implicitly as if every member that has arithmetic type were assigned 0 and every member that has pointer type were assigned a null pointer constant." \$\endgroup\$godlygeek– godlygeek2014年08月05日 20:38:23 +00:00Commented Aug 5, 2014 at 20:38
-
1\$\begingroup\$ In C the
{0}
construct is known as the universal (zero) initializer because it can be applied to every type (thoughgcc
seems to not like it by emitting an undeserved warning when using it on a compound type, but that'sgcc
's problem for being stupid). Please use it, that's what it's for. \$\endgroup\$Thomas– Thomas2014年08月06日 02:22:22 +00:00Commented Aug 6, 2014 at 2:22
Others have already pointed out the some things about the coding style. Following their advice, you will be on par with more professional C code. Unfortunately, that will still not give you good random numbers.
Now, depending on what you want to do, that might not be too bad, but I still wanted to point it out. The C function rand
is (are) a very very bad random number generator. Given only few (full) outputs, one can already deduce some information about the next numbers.
Also srand(time(NULL));
is a relatively bad seed. Running the program on two different computers but at the same time (in seconds!) will result in identical numbers. Running the program in short succession will result in similar numbers.
Furthermore, rand() % 10
will not create evenly distributed numbers. RAND_MAX
is usually not divisible by 10, so the lower numbers will be more likely than 8 or 9. This is even more pronounced when you want to draw numbers from a larger range. RAND_MAX
is often only of the order of 32k, so drawing numbers in a range of 1-2000 or similar will already be significantly skewed towards the lower numbers.
tl;dr If the random numbers are not important (e.g. a computer game), everything should be fine. If you ever want to do numerics or cryptography, then do not use rand()
.
edit: It is somewhat difficult to get good random numbers in C...
If you want to do cryptography then read /dev/random
on linux systems (note that under OSX /dev/random
is not cryptographically secure!) or call CryptGenRandom
on windows systems. Both will deliver 'true' random numbers. For numerics you can (for example) use a Mersenne Twister (creative-commons code on the wikipedia) and seed it with true random numbers or with (preferably a hash of) the current time in nanoseconds.
In either case you still have to do some work to get a uniform distribution over any given range. A simple but ineffective method simply draws a new random number when needed
/// @arg (*random) - a function providing (pseudo) random unsigned numbers
/// @arg randMax - the largest number that can be returned by (*random)
/// @arg rangeMax - the largest desired number
/// @returns a uniformly random number in [0, rangeMax]
unsigned uniform(unsigned (*random)(void), unsigned randMax, unsigned rangeMax) {
if (rangeMax <= randMax) {
// the following assumes, that both maxima are smaller than 2^32-1
randMax += 1;
rangeMax += 1;
unsigned cutoff = (randMax / rangeMax) * rangeMax;
unsigned result = random();
while (result >= cutoff) {
result = random();
}
return result % rangeMax;
} else {
// more complicated code or error
}
}
If you do not want to throw away any random numbers that you sampled you will have to manage some internal state. This quickly becomes complicated and is beyond the scope of this answer.
-
2\$\begingroup\$ It would be nice if you mentioned the alternative to
rand
. \$\endgroup\$RubberDuck– RubberDuck2014年08月06日 00:17:32 +00:00Commented Aug 6, 2014 at 0:17 -
1\$\begingroup\$ @ckuhn203: It may just be a general idea. There aren't too many options in C, unless one tries to implement something from scratch. \$\endgroup\$Jamal– Jamal2014年08月06日 00:55:05 +00:00Commented Aug 6, 2014 at 0:55
-
\$\begingroup\$ @ckuhn203 I am not aware of any library that provides a simple solution for C but I added some pointers on how to obtain proper random numbers manually. \$\endgroup\$example– example2014年08月06日 10:29:46 +00:00Commented Aug 6, 2014 at 10:29
This is actually pretty concise for a simple program, but I have two additional things:
Declare
count
right before the loop since it's a loop counter variable. You shouldn't just "list" every variable at the top, as it may make it harder to keep track of them if they're no longer used.For loop counters in general, this is applicable to pre-C99. If you have a newer version, thereby allowing you to initialize the counter variable within the loop statement, then do that instead.
srand()
can be moved to the very top ofmain()
to help keep it separate, although this not really a huge deal. The important thing in general, though, is to only call it inmain()
in more modular programs (to avoid resetting the seed).
-
\$\begingroup\$ I have always declared all variables at the beginning, as this is how I've seen ii done and always done it. What do you mean with
srand()
? Are you saying to make it global as long as it is only used in one function? \$\endgroup\$user50585– user505852014年08月05日 18:28:38 +00:00Commented Aug 5, 2014 at 18:28 -
\$\begingroup\$ @user50585: Oh, sorry; I meant the very top of
main()
. I'll fix that. \$\endgroup\$Jamal– Jamal2014年08月05日 18:29:36 +00:00Commented Aug 5, 2014 at 18:29 -
\$\begingroup\$ But where would I put it if multiple functions used it? In
main
at the top, or as a global? What about spread across multiple files, only one file can have it, right. \$\endgroup\$user50585– user505852014年08月05日 18:31:02 +00:00Commented Aug 5, 2014 at 18:31 -
2\$\begingroup\$ @user50585: This is probably a habit from the olden days, when you actually HAD to do that (not sure if applies to C++, but it does to C). Nowadays you can just do
for (int i = 0; i < 1000; i++)
. It improves readability and ensures you don't use the iteration variable somewhere else because you forget that you set it in the loop by limiting its scope to the loop itself. \$\endgroup\$user622505– user6225052014年08月05日 19:12:40 +00:00Commented Aug 5, 2014 at 19:12 -
1\$\begingroup\$ @user50585: Yes, it's bad. It decreases readability by separating the variable names (and types) from the spots where they're actually used, which makes it harder for a reader to know, at the point where it's used, what type the variable is. It also makes it more work to remove a variable - you have to remove it both from the list it was declared in and from the place where it was used, so the "all variables declared at the start of a block" method tends to leave lots of unused variables lying around that no one bothered to remove. \$\endgroup\$godlygeek– godlygeek2014年08月05日 19:20:45 +00:00Commented Aug 5, 2014 at 19:20
rand()
or are you trying to roll "fairly" and keep count to see how much it varies from an even distribution? \$\endgroup\$rand()
+srand
. There is the modulo bias @Edward mentioned, but it's small compared to the effect ofrand
sucking. \$\endgroup\$