Quasi-random sequences: how to improve style and tests?

Question 1

The requirements for this one were (original SO question):

Generate a random-ish sequence of items.
Sequence should have each item N times.
Sequence shouldn't have serial runs longer than a given number (longest below).

The solution was actually drafted by another user, this is my implementation (influenced by random.shuffle).

from random import random
from itertools import groupby # For testing the result
try: xrange
except: xrange = range
def generate_quasirandom(values, n, longest=3, debug=False):
 # Sanity check
 if len(values) < 2 or longest < 1:
 raise ValueError
 # Create a list with n * [val]
 source = []
 sourcelen = len(values) * n
 for val in values:
 source += [val] * n
 # For breaking runs
 serial = 0
 latest = None
 for i in xrange(sourcelen):
 # Pick something from source[:i]
 j = int(random() * (sourcelen - i)) + i
 if source[j] == latest:
 serial += 1
 if serial >= longest:
 serial = 0
 guard = 0
 # We got a serial run, break it
 while source[j] == latest:
 j = int(random() * (sourcelen - i)) + i
 guard += 1
 # We just hit an infinit loop: there is no way to avoid a serial run
 if guard > 10:
 print("Unable to avoid serial run, disabling asserts.")
 debug = False
 break
 else:
 serial = 0
 latest = source[j]
 # Move the picked value to source[i:]
 source[i], source[j] = source[j], source[i]
 # More sanity checks
 check_quasirandom(source, values, n, longest, debug)
 return source
def check_quasirandom(shuffled, values, n, longest, debug):
 counts = []
 # We skip the last entries because breaking runs in them get too hairy
 for val, count in groupby(shuffled):
 counts.append(len(list(count)))
 highest = max(counts)
 print('Longest run: %d\nMax run lenght:%d' % (highest, longest))
 # Invariants
 assert len(shuffled) == len(values) * n
 for val in values:
 assert shuffled.count(val) == n
 if debug:
 # Only checked if we were able to avoid a sequential run >= longest
 assert highest <= longest
for x in xrange(10, 1000):
 generate_quasirandom((0, 1, 2, 3), 1000, x//10, debug=True)

I'd like to receive any input you have on improving this code, from style to comments to tests and anything else you can think of.

Question 2

A couple of possible code improvements that I noticed:

sourcelen = len(values) * n

This seems unnecessarily complicated to me. I mean, after a second of thinking the reader of this line will realize that len(values) * n is indeed the length of source, but that's still one step of thinking more than would be required if you just did sourcelen = len(source) (after populating source of course).

That being said, I don't see why you need to store the length of source in a variable at all. Doing for i in xrange(len(source)): isn't really less readable or less efficient than doing for i in xrange(sourcelen):, so I'd just get rid of the variable altogether.

source = []
for val in values:
 source += [val] * n

This can be written as a list comprehension like this:

source = [x for val in values for x in [val]*n]

Using list comprehensions is usually considered more idiomatic python than building up a list iteratively. It is also often more efficient.

As Fred Nurk points out, the list comprehension can also be written as

source = [val for val in values for _ in xrange(n)]

which avoids the creation of a temporary list and is maybe a bit more readable.

j = int(random() * (sourcelen - i)) + i

To get a random integer between x (inclusive) and y (exclusive), you can use random.randrange(x,y), so the above can be written as:

j = randrange(i, len(source) - i)

(You'll also need to import randrange instead of random from the random module). This makes it more immediately obviously that j is a random number between i and len(source) - i and introduces less room for mistakes.

Question 3

Alternatively: source = [val for val in values for _ in xrange(n)] which, for me, makes it more clear you're repeating values compared to creating a temporary list, multiplying it, and then iterating it.

sepp2k sepp2k 9,0122 gold badges39 silver badges51 bronze badges · Accepted Answer · 2011-01-26 23:33:23Z

A couple of possible code improvements that I noticed:

sourcelen = len(values) * n

This seems unnecessarily complicated to me. I mean, after a second of thinking the reader of this line will realize that len(values) * n is indeed the length of source, but that's still one step of thinking more than would be required if you just did sourcelen = len(source) (after populating source of course).

That being said, I don't see why you need to store the length of source in a variable at all. Doing for i in xrange(len(source)): isn't really less readable or less efficient than doing for i in xrange(sourcelen):, so I'd just get rid of the variable altogether.

source = []
for val in values:
 source += [val] * n

This can be written as a list comprehension like this:

source = [x for val in values for x in [val]*n]

Using list comprehensions is usually considered more idiomatic python than building up a list iteratively. It is also often more efficient.

As Fred Nurk points out, the list comprehension can also be written as

source = [val for val in values for _ in xrange(n)]

which avoids the creation of a temporary list and is maybe a bit more readable.

j = int(random() * (sourcelen - i)) + i

To get a random integer between x (inclusive) and y (exclusive), you can use random.randrange(x,y), so the above can be written as:

j = randrange(i, len(source) - i)

(You'll also need to import randrange instead of random from the random module). This makes it more immediately obviously that j is a random number between i and len(source) - i and introduces less room for mistakes.

Alternatively: source = [val for val in values for _ in xrange(n)] which, for me, makes it more clear you're repeating values compared to creating a temporary list, multiplying it, and then iterating it.

Stack Exchange Network

Quasi-random sequences: how to improve style and tests?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Quasi-random sequences: how to improve style and tests?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions