The requirements for this one were (original SO question):
- Generate a random-ish sequence of items.
- Sequence should have each item N times.
- Sequence shouldn't have serial runs longer than a given number (longest below).
The solution was actually drafted by another user, this is my implementation (influenced by random.shuffle).
from random import random
from itertools import groupby # For testing the result
try: xrange
except: xrange = range
def generate_quasirandom(values, n, longest=3, debug=False):
# Sanity check
if len(values) < 2 or longest < 1:
raise ValueError
# Create a list with n * [val]
source = []
sourcelen = len(values) * n
for val in values:
source += [val] * n
# For breaking runs
serial = 0
latest = None
for i in xrange(sourcelen):
# Pick something from source[:i]
j = int(random() * (sourcelen - i)) + i
if source[j] == latest:
serial += 1
if serial >= longest:
serial = 0
guard = 0
# We got a serial run, break it
while source[j] == latest:
j = int(random() * (sourcelen - i)) + i
guard += 1
# We just hit an infinit loop: there is no way to avoid a serial run
if guard > 10:
print("Unable to avoid serial run, disabling asserts.")
debug = False
break
else:
serial = 0
latest = source[j]
# Move the picked value to source[i:]
source[i], source[j] = source[j], source[i]
# More sanity checks
check_quasirandom(source, values, n, longest, debug)
return source
def check_quasirandom(shuffled, values, n, longest, debug):
counts = []
# We skip the last entries because breaking runs in them get too hairy
for val, count in groupby(shuffled):
counts.append(len(list(count)))
highest = max(counts)
print('Longest run: %d\nMax run lenght:%d' % (highest, longest))
# Invariants
assert len(shuffled) == len(values) * n
for val in values:
assert shuffled.count(val) == n
if debug:
# Only checked if we were able to avoid a sequential run >= longest
assert highest <= longest
for x in xrange(10, 1000):
generate_quasirandom((0, 1, 2, 3), 1000, x//10, debug=True)
I'd like to receive any input you have on improving this code, from style to comments to tests and anything else you can think of.
1 Answer 1
A couple of possible code improvements that I noticed:
sourcelen = len(values) * n
This seems unnecessarily complicated to me. I mean, after a second of thinking the reader of this line will realize that len(values) * n
is indeed the length of source
, but that's still one step of thinking more than would be required if you just did sourcelen = len(source)
(after populating source
of course).
That being said, I don't see why you need to store the length of source
in a variable at all. Doing for i in xrange(len(source)):
isn't really less readable or less efficient than doing for i in xrange(sourcelen):
, so I'd just get rid of the variable altogether.
source = []
for val in values:
source += [val] * n
This can be written as a list comprehension like this:
source = [x for val in values for x in [val]*n]
Using list comprehensions is usually considered more idiomatic python than building up a list iteratively. It is also often more efficient.
As Fred Nurk points out, the list comprehension can also be written as
source = [val for val in values for _ in xrange(n)]
which avoids the creation of a temporary list and is maybe a bit more readable.
j = int(random() * (sourcelen - i)) + i
To get a random integer between x
(inclusive) and y
(exclusive), you can use random.randrange(x,y)
, so the above can be written as:
j = randrange(i, len(source) - i)
(You'll also need to import randrange
instead of random
from the random module). This makes it more immediately obviously that j
is a random number between i
and len(source) - i
and introduces less room for mistakes.
-
\$\begingroup\$ Alternatively:
source = [val for val in values for _ in xrange(n)]
which, for me, makes it more clear you're repeating values compared to creating a temporary list, multiplying it, and then iterating it. \$\endgroup\$Fred Nurk– Fred Nurk2011年01月27日 09:20:50 +00:00Commented Jan 27, 2011 at 9:20