I am a novice when it comes to Python. I am working stuff related to Flowshop scheduling and so to generate benchmarks I used one by Taillard (Link to his homepage). The details are in the technical report.
Since I ported the code from C which I am not really good at please review my Python script:
# Format - random seed, number of jobs, number of machines
# Use a separate file and precede comments with a # symbol
from numpy import int32
import math
problem = list()
# Generate a random number uniformly between low and high
def unif(seed, low, high):
"""This function generates a random number between low\
and high based on seed
Seed is implemented as static for set of random values based on a series\
of seed values"""
if getattr(unif, "seed_value",0) == 0 or unif.seed_value is None:
#create attribute seed_value which is static for method unif
unif.seed_value = seed #inititalization if attribute does not exist
# constants for random number
a = 16807
b = 127773
c = 2836
m = 2**(31) - 1
k = math.floor(unif.seed_value / b)
unif.seed_value = int32(a * (unif.seed_value % b) - k * c)
# use numpy 32 bit int
if unif.seed_value < 0: # 32 bit int overflow
unif.seed_value = unif.seed_value + m
value_0_1 = unif.seed_value/float(m)
return int(low + math.floor(value_0_1 * (high - low + 1))) # random\
# number generated
# generates a problem for given number of machines and jobs in problem tuple
def generate_flow_shop(p, output):
"""Generates a flow shop problem set for a set in problem tuple and\
writes to file in output"""
time_seed = int(problem[p][0])
d = list() # list containing the problem
for i in range( int(problem[p][2]) ):
line = list()
line = [ unif(time_seed, 1, 99) for j in range(int(problem[p][1])) ]
# for j in range(0, int(problem[p][1])):
# line.append(unif(time_seed, 1, 99))
# 99 - max duration of op.
d.append(line)
unif.seed_value = None
for line in d:
datum = ' '.join(map(str,line))
output.write("%s\n" % (datum))
def main():
name = input("Enter filename to write the generated values:\n> ")
seedfile = input("Enter filename of the seed values:\n> ")
with open(seedfile) as in_file:
for i in in_file:
if line[0] == '#':
continue
else:
line = i.strip().split()
problem.append(tuple( map(int,line) ) )
with open("%s" % (name), "w") as out_file:
for i in range(0, len(problem)):
out_file.write("%s %04d %s %s\n" %\
("p", i, problem[i][1], problem[i][2]))
generate_flow_shop(i, out_file)
if __name__ == '__main__':
main()
The link to the original C program is here.
This is the link to the output which is the expected output ie. the code is actually correct.
The main purpose of this module is to generate a set of values that can be used to test our implemented solutions ie. it is a generated problem set but not completely random but reproducible if the seeds entered are the same.
Note: The seedfile is a file that contains the values that are defined in the struct problem in the C program.
What I mainly need feedback on are:
- Can the script be improved further with respect to use of Python idioms?
- Since the generated file should again be parsed as input for testing, is writing to a file and then opening it again to test the scripts the best way? Is the written file format the best one for parsing again as input?
2 Answers 2
unif.seed_value
is very un-Pythonic. Instead use a class.- Rather than using
getattr(unif, "seed_value",0)
, you can makeself.seed_value
at object instantiation, and use that as the default, unless it's falsy, where it'd instead useseed
. math.floor
is mostly the same asint
. If you're using this on division, you can change it to floor divisions using//
.numpy.int32
is odd, it adds a dependency tonumpy
that shouldn't be needed. Instead you can use&
or%
, this also means that it doesn't throw an error if the number is greater than(1 << 31) - 1
.unif.seed_value/float(m)
is the same asunif.seed_value/m
in Python 3. In Python 2 it'd do floor division, unless one number is a float.generate_flow_shop
shouldn't need to takeoutput
, that should be implemented inmain
.generate_flow_shop
should take a single problem rather than the list of problems.- You can use tuple unpacking to give all values in the problem a name.
- I heavily recommend
str.format
over the old%
formatting operator. It has less bugs, and is enhanced in 3.6 with f-strings. - You can use
_
as a 'throw away variable' in Python, it's somewhat a standard and lets developers ignore that variable, greatly increasing readability.
unif
uses some fairly odd math, and so I decided to go for a more succinct approach.
And so I'd change your code to: (note, it's untested)
class Random:
def __init__(self):
self.seed_value = None
def unif(self, seed, low, high):
seed_value = self.seed_value or seed
a = 16807
b = 127773
c = 2836
m = (1 << 31) - 1
seed_value = (a * (seed_value % b) - (seed_value // b) * c)
self.seed_value = (seed_value + (seed_value < 0)) & m
return int(low + self.seed_value / m * (high - low + 1))
def generate_flow_shop(self, problem):
"""Generates a flow shop problem set for a set in problem tuple and\
writes to file in output"""
time_seed, num_jobs, num_mach = problem
r = [
[
self.unif(time_seed, 1, 99)
for _ in range(num_jobs)
]
for _ in range(num_mach)
]
self.seed_value = None
return r
def main():
name = input("Enter filename to write the generated values:\n> ")
seedfile = input("Enter filename of the seed values:\n> ")
r = Random()
with open(seedfile) as in_file:
problems = [
tuple(map(int, line.strip().split()))
for line in in_file
if line[0] != "#"
]
with open(name, "w") as out_file:
for i, problem in enumerate(problems):
out_file.write("p {0:04} {1[1]} {1[2]}\n".format(i, problem)))
data = r.generate_flow_shop(problem)
out_file.write(''.join(' '.join(line) + '\n' for line in data))
if __name__ == '__main__':
main()
-
\$\begingroup\$ The overflow part of the int which should be an error was incorporated as a feature in the program and so the values seem to differ because in your program. There are few trivial bugs in it like extra brackets and join used on int but nothing too big. How do I reproduce overflow without numpy int32? Because without it the generated values change. Other than that the code is blazing fast so thanks on the other suggestions. \$\endgroup\$Kronos– Kronos2017年05月26日 06:26:02 +00:00Commented May 26, 2017 at 6:26
-
\$\begingroup\$ I used this to imitate int32 overflow behaviour -
if -(m+1) > seed_value or seed_value > m: seed_value %= (2*m-1) if seed_value > m: seed_value = m - seed_value
\$\endgroup\$Kronos– Kronos2017年05月26日 06:56:30 +00:00Commented May 26, 2017 at 6:56
As far as styling goes- Triple quotation marks signifying a docstring end should be on a seperate line:
def func():
"""This docstring explains
how the function should
be used.
"""
Also, there is no need to escape newlines in docstrings. I suppose you could open the file in w+ or r+ mode (wr / rw);
with open(seedfile, "r+") as f:
# magic