There's no test suite. I know it's hard for you to write test cases because of the randomness. You probably want to take a hint from Mennon van Slooten and provide a way for to change the random choices to a deterministic sequence of choices.
Your collections of example data (
_male_first_name
etc.) are stored as tuples, but it would be better to store these as lists. Tuples are fixed-size collections typically used as lightweight representations of records. Lists sets are variable-size collections typically containing similar items.Your example data is very America-centric. Given that the purpose of your module is to produce example data for testing, it will be useful to have a wider variety of names, including names with accented letters, or in different scripts.
It is generally results in source code that is easier to read (with fewer quotation characters and commas) if you produce this kind of data using
split
. For example,_female_first_names = u""" Mary Patricia Linda Barbara Elizabeth Zoé Yến София فاطمة 明子 美玲 ยิ่งลักษณ์ """.split()
The function
_random_item
is already in the Python library asrandom.choice
.The string
"0123456789"
is already in the Python library asstring.digits
.It's not clear why
_random_data
needs to strip all initial@
signs from its argument before trying to look it up. You only call this function in four places: either with a literal argument (for example,_random_data('@LETTER_LOWER')
) where you could just call the corresponding function directly, or in a context where the calleryou could have easily strippedstrip the@
themselves. It also fails to raise an error if the key is missing (so that mistakes like@NUBMER
are not caught).Your definitions of
data
will look nicer (fewer quotation marks) if you use thedict
constructor:data = dict( NUMBER = lambda: _random_item("0123456789"), LETTER_UPPER = lambda: _random_item(string.uppercase), LETTER_LOWER = lambda: _random_item(string.lowercase), ...
Since most of the values in
data
are of the formlambda: _random_item(...)
, why not special-case them to save on boilerplate? You could write something like:def _random_data(key): """ Construct a random data item for `key` and return it. Raise KeyError if `key` is unknown. """ constructor = data[key] if isinstance(constructor, types.FunctionType): return constructor() else: return random.choice(constructor)
increment = 0
template_key = (id(template), key)
if template_key in self._increments:
increment = _increments[template_key]
elif m.group(4):
increment = int(m.group(4))
_increments[template_key] = increment
OK, let's put all that together. (There are a couple of bonus improvements in here for you to spot for yourself!)
_constantly_zero = itertools.repeat(0)
class _Instantiator(object):
def __init__(self):
self._increments = dict()
def instantiate(self, template, have_n = False, n = 1,
increment = _constantly_zero):
if isinstance(template, dict):
generated = {}
for key, value in template.iteritems():
m = re.match(r"^(\w+)(?:\|(?:(\d+)-(\d+)|\+(\d+)))?$", key)
if not m:
raise ValueError("Bad key: {0}".format(key))
have_n = False
n = 1
if m.group(2):
have_n = True
n = random.randint(int(m.group(2)), int(m.group(3)))
template_key = (id(template), key)
increment = _constantly_zero
if template_key in self._increments:
increment = self._increments[template_key]
elif m.group(4):
increment = itertools.count(start = 0, step = int(m.group(4)))
self._increments[template_key] = increment
generated[m.group(1)] = self.instantiate(value, have_n, n, increment)
return generated
elif isinstance(template, list):
return [self.instantiate(template[0]) for _ in xrange(n)]
elif isinstance(template, bool):
if have_n:
return [False, True][n]bool(n)
else:
return template
elif isinstance(template, int):
if have_n:
return n
else:
return template + next(increment)
elif isinstance(template, str) or isinstance(template, unicode):
if template:
pattern = re.compile(r'@([A-Z_0-9]+)')
repl = lambda m: _random_data(m.group(1))
return ''.join(pattern.sub(repl, template) for _ in xrange(n))
else:
return ''.join(random.choice(string.letters) for _ in xrange(n))
else:
return template
def instantiate_object(template):
"""
Instantiate an object based on `template` and return it.
"""
return _Instantiator().instantiate(template)
There's no test suite. I know it's hard for you to write test cases because of the randomness. You probably want to take a hint from Mennon van Slooten and provide a way for to change the random choices to a deterministic sequence of choices.
Your collections of example data (
_male_first_name
etc.) are stored as tuples, but it would be better to store these as lists. Tuples are fixed-size collections typically used as lightweight representations of records. Lists sets are variable-size collections typically containing similar items.Your example data is very America-centric. Given that the purpose of your module is to produce example data for testing, it will be useful to have a wider variety of names, including names with accented letters, or in different scripts.
It is generally results in source code that is easier to read (with fewer quotation characters and commas) if you produce this kind of data using
split
. For example,_female_first_names = u""" Mary Patricia Linda Barbara Elizabeth Zoé Yến София فاطمة 明子 美玲 ยิ่งลักษณ์ """.split()
The function
_random_item
is already in the Python library asrandom.choice
.The string
"0123456789"
is already in the Python library asstring.digits
.It's not clear why
_random_data
needs to strip all initial@
signs from its argument before trying to look it up. You only call this function in four places: either with a literal argument (for example,_random_data('@LETTER_LOWER')
) where you could just call the corresponding function directly, or in a context where the caller could have easily stripped the@
themselves. It also fails to raise an error if the key is missing (so that mistakes like@NUBMER
are not caught).Your definitions of
data
will look nicer (fewer quotation marks) if you use thedict
constructor:data = dict( NUMBER = lambda: _random_item("0123456789"), LETTER_UPPER = lambda: _random_item(string.uppercase), LETTER_LOWER = lambda: _random_item(string.lowercase), ...
Since most of the values in
data
are of the formlambda: _random_item(...)
, why not special-case them to save boilerplate? You could write something like:def _random_data(key): """ Construct a random data item for `key` and return it. Raise KeyError if `key` is unknown. """ constructor = data[key] if isinstance(constructor, types.FunctionType): return constructor() else: return random.choice(constructor)
increment = 0
template_key = (id(template), key)
if template_key in self._increments:
increment = _increments[template_key]
elif m.group(4):
increment = int(m.group(4))
_increments[template_key] = increment
OK, let's put all that together.
_constantly_zero = itertools.repeat(0)
class _Instantiator(object):
def __init__(self):
self._increments = dict()
def instantiate(self, template, have_n = False, n = 1,
increment = _constantly_zero):
if isinstance(template, dict):
generated = {}
for key, value in template.iteritems():
m = re.match(r"^(\w+)(?:\|(?:(\d+)-(\d+)|\+(\d+)))?$", key)
if not m:
raise ValueError("Bad key: {0}".format(key))
have_n = False
n = 1
if m.group(2):
have_n = True
n = random.randint(int(m.group(2)), int(m.group(3)))
template_key = (id(template), key)
increment = _constantly_zero
if template_key in self._increments:
increment = self._increments[template_key]
elif m.group(4):
increment = itertools.count(start = 0, step = int(m.group(4)))
self._increments[template_key] = increment
generated[m.group(1)] = self.instantiate(value, have_n, n, increment)
return generated
elif isinstance(template, list):
return [self.instantiate(template[0]) for _ in xrange(n)]
elif isinstance(template, bool):
if have_n:
return [False, True][n]
else:
return template
elif isinstance(template, int):
if have_n:
return n
else:
return template + next(increment)
elif isinstance(template, str) or isinstance(template, unicode):
if template:
pattern = re.compile(r'@([A-Z_0-9]+)')
repl = lambda m: _random_data(m.group(1))
return ''.join(pattern.sub(repl, template) for _ in xrange(n))
else:
return ''.join(random.choice(string.letters) for _ in xrange(n))
else:
return template
def instantiate_object(template):
"""
Instantiate an object based on `template` and return it.
"""
return _Instantiator().instantiate(template)
There's no test suite. I know it's hard for you to write test cases because of the randomness. You probably want to take a hint from Mennon van Slooten and provide a way for to change the random choices to a deterministic sequence of choices.
Your collections of example data (
_male_first_name
etc.) are stored as tuples, but it would be better to store these as lists. Tuples are fixed-size collections typically used as lightweight representations of records. Lists sets are variable-size collections typically containing similar items.Your example data is very America-centric. Given that the purpose of your module is to produce example data for testing, it will be useful to have a wider variety of names, including names with accented letters, or in different scripts.
It is generally results in source code that is easier to read (with fewer quotation characters and commas) if you produce this kind of data using
split
. For example,_female_first_names = u""" Mary Patricia Linda Barbara Elizabeth Zoé Yến София فاطمة 明子 美玲 ยิ่งลักษณ์ """.split()
The function
_random_item
is already in the Python library asrandom.choice
.The string
"0123456789"
is already in the Python library asstring.digits
.It's not clear why
_random_data
needs to strip all initial@
signs from its argument before trying to look it up. You only call this function in a context where you could easily strip the@
. It also fails to raise an error if the key is missing (so that mistakes like@NUBMER
are not caught).Your definitions of
data
will look nicer (fewer quotation marks) if you use thedict
constructor:data = dict( NUMBER = lambda: _random_item("0123456789"), LETTER_UPPER = lambda: _random_item(string.uppercase), LETTER_LOWER = lambda: _random_item(string.lowercase), ...
Since most of the values in
data
are of the formlambda: _random_item(...)
, why not special-case them to save on boilerplate? You could write something like:def _random_data(key): """ Construct a random data item for `key` and return it. Raise KeyError if `key` is unknown. """ constructor = data[key] if isinstance(constructor, types.FunctionType): return constructor() else: return random.choice(constructor)
increment = 0
template_key = (id(template), key)
if template_key in _increments:
increment = _increments[template_key]
elif m.group(4):
increment = int(m.group(4))
_increments[template_key] = increment
OK, let's put all that together. (There are a couple of bonus improvements in here for you to spot for yourself!)
_constantly_zero = itertools.repeat(0)
class _Instantiator(object):
def __init__(self):
self._increments = dict()
def instantiate(self, template, have_n = False, n = 1,
increment = _constantly_zero):
if isinstance(template, dict):
generated = {}
for key, value in template.iteritems():
m = re.match(r"^(\w+)(?:\|(?:(\d+)-(\d+)|\+(\d+)))?$", key)
if not m:
raise ValueError("Bad key: {0}".format(key))
have_n = False
n = 1
if m.group(2):
have_n = True
n = random.randint(int(m.group(2)), int(m.group(3)))
template_key = (id(template), key)
increment = _constantly_zero
if template_key in self._increments:
increment = self._increments[template_key]
elif m.group(4):
increment = itertools.count(start = 0, step = int(m.group(4)))
self._increments[template_key] = increment
generated[m.group(1)] = self.instantiate(value, have_n, n, increment)
return generated
elif isinstance(template, list):
return [self.instantiate(template[0]) for _ in xrange(n)]
elif isinstance(template, bool):
if have_n:
return bool(n)
else:
return template
elif isinstance(template, int):
if have_n:
return n
else:
return template + next(increment)
elif isinstance(template, str) or isinstance(template, unicode):
if template:
pattern = re.compile(r'@([A-Z_0-9]+)')
repl = lambda m: _random_data(m.group(1))
return ''.join(pattern.sub(repl, template) for _ in xrange(n))
else:
return ''.join(random.choice(string.letters) for _ in xrange(n))
else:
return template
def instantiate_object(template):
"""
Instantiate an object based on `template` and return it.
"""
return _Instantiator().instantiate(template)
The name could be improved: generate has a special meaning in Python (referring to the output of a generator), and the json part is misleading since there's nothing JSON-specific about this function (it operates on Python objects, not on JSON representations). Since instantiate is often used for filling in a template, I think I would use a name like
instantiate_object
. (Andinstantiate_json
forgenerate_json
.)You have to go through contortions because you are overloading the meaning ofThe variable
length
is misleadingly named because it's not always a length (it might just be a number). Perhapsn
would be better. You also have to go through contortions because you are overloading its meaning: it'sNone
if no lengthnumber was specified (in which case sometimes it needneeds to be treated as if it has the value1
), or else it's a number. It would be much clearer to have a separate variable (have_n
in my code below) that indicates whether a number was supplied.Your regular expression for finding repeat counts in JSON keys is
\w+\|(\d+)-(\d+)
. This matches strings like"a|0-2trailing garbage"
. It's probably a good idea to anchor it to the end of the string (\w\|(\d+)-(\d+)$
). Or maybe, depending on exactly what you want to match, to the beginning of the string as well:re.match(r'\w+\|(\d+)-(\d+)$', name)
. But then you need to decide what to do for non-matching keys. Maybe raise an exception?The structure
matches = re.search(...) if matches: ...
The name could be improved: generate has a special meaning in Python (referring to the output of a generator), and the json part is misleading since there's nothing JSON-specific about this function (it operates on Python objects, not on JSON representations). Since instantiate is often used for filling in a template, I think I would use a name like
instantiate_object
. (Andinstantiate_json
forgenerate_json
.)You have to go through contortions because you are overloading the meaning of
length
: it'sNone
if no length was specified (in which case sometimes it need to be treated as if it has the value1
)Your regular expression for finding repeat counts in JSON keys is
\w+\|(\d+)-(\d+)
. This matches strings like"a|0-2trailing garbage"
. It's probably a good idea to anchor it to the end of the string (\w\|(\d+)-(\d+)$
). Or maybe, depending on exactly what you want to match, to the beginning of the string as well:re.match(r'\w+\|(\d+)-(\d+)$', name)
. But then you need to decide what to do for non-matching keys. Maybe raise an exception?The structure
matches = re.search(...) if matches: ...
The name could be improved: generate has a special meaning in Python (referring to the output of a generator), and the json part is misleading since there's nothing JSON-specific about this function (it operates on Python objects, not on JSON representations). Since instantiate is often used for filling in a template, I think I would use a name like
instantiate_object
. (Andinstantiate_json
forgenerate_json
.)The variable
length
is misleadingly named because it's not always a length (it might just be a number). Perhapsn
would be better. You also have to go through contortions because you are overloading its meaning: it'sNone
if no number was specified (in which case sometimes it needs to be treated as if it has the value1
), or else it's a number. It would be much clearer to have a separate variable (have_n
in my code below) that indicates whether a number was supplied.Your regular expression for finding repeat counts in JSON keys is
\w+\|(\d+)-(\d+)
. This matches strings like"a|0-2trailing garbage"
. It's probably a good idea to anchor it to the end of the string (\w\|(\d+)-(\d+)$
). Or maybe, depending on exactly what you want to match, to the beginning of the string as well:re.match(r'\w+\|(\d+)-(\d+)$', name)
. But then you need to decide what to do for non-matching keys. Maybe raise an exception?The structure
matches = re.search(...) if matches: ...
1. Introduction
This review grew to be very long, so I'll say up front that you shouldn't take the length of this to heart: your code is not bad, especially if you are new to Python. There's always a lot of things to say about a piece of code of this length, and idiomatic Python has a bunch of features (sets, generators, comprehensions, iterators) that will be unfamiliar to users of some other languages. So take everything I have to say with a pinch of salt (except for item #1 under "General comments", which is really the only big problem here).
2. General comments
There's no test suite. I know it's hard for you to write test cases because of the randomness. You probably want to take a hint from Mennon van Slooten and provide a way for to change the random choices to a deterministic sequence of choices.
Your collections of example data (
_male_first_name
etc.) are stored as tuples, but it would be better to store these as lists. Tuples are fixed-size collections typically used as lightweight representations of records. Lists sets are variable-size collections typically containing similar items.Your example data is very America-centric. Given that the purpose of your module is to produce example data for testing, it will be useful to have a wider variety of names, including names with accented letters, or in different scripts.
It is generally results in source code that is easier to read (with fewer quotation characters and commas) if you produce this kind of data using
split
. For example,_female_first_names = u""" Mary Patricia Linda Barbara Elizabeth Zoé Yến София فاطمة 明子 美玲 ยิ่งลักษณ์ """.split()
The function
_random_item
is already in the Python library asrandom.choice
.The string
"0123456789"
is already in the Python library asstring.digits
.It's not clear why
_random_data
needs to strip all initial@
signs from its argument before trying to look it up. You only call this function in four places: either with a literal argument (for example,_random_data('@LETTER_LOWER')
) where you could just call the corresponding function directly, or in a context where the caller could have easily stripped the@
themselves. It also fails to raise an error if the key is missing (so that mistakes like@NUBMER
are not caught).Your definitions of
data
will look nicer (fewer quotation marks) if you use thedict
constructor:data = dict( NUMBER = lambda: _random_item("0123456789"), LETTER_UPPER = lambda: _random_item(string.uppercase), LETTER_LOWER = lambda: _random_item(string.lowercase), ...
Since most of the values in
data
are of the formlambda: _random_item(...)
, why not special-case them to save boilerplate? You could write something like:def _random_data(key): """ Construct a random data item for `key` and return it. Raise KeyError if `key` is unknown. """ constructor = data[key] if isinstance(constructor, types.FunctionType): return constructor() else: return random.choice(constructor)
Building a string by repeatedly extending it with
+=
is a well-known anti-pattern in Python. This is because extending a string is inefficient in Python: a new string gets allocated and the old string copied across each time. This means that any algorithm that builds a string by repeated extension runs like O(n2). It is nearly always better to generate the items to be assembled into the string and thenjoin
them to produce the result. This technique also avoidavoids fencepost difficultieserrors like spurious extra spaces at the start or end. So your could write:def _lorem_ipsum(): """ Return a random paragraph of placeholder text. """ length = random.randrange(2, len(_lorem) / 2) return ' '.join(random.choice(_lorem) for _ in xrange(length))
generate_json
takes an optional argumentname
which is always ignored. You can omit this.
My remaining comments apply to generate_json_object
.
3. Comments on generate_json_object
It is usually better in Python to test if an object belongs to a type
It is usually better in Python to test if an object belongs to a typet
by writingisinstance(object, t)
rather thantype(object) is t
. The reason is thatobject
may belong to a subtype oft
(for example, adefaultdict
orOrderedDict
instead of a plaindict
). There is a gotcha here, which is thebool
is a subtype ofint
, so you need to order your tests so that Booleans get tested before integers.t
by writingisinstance(object, t)
rather thantype(object) is t
. The reason is thatobject
may belong to a subtype oft
(for example, adefaultdict
orOrderedDict
instead of a plaindict
).
There is a gotcha here, which is that bool
is a subtype of int
, so you need to order your tests so that Booleans get tested before integers.
The structure of
generate_json_object
involves various branches which assign an object to the variablegenerated
. This is then returned at the end. Why not just return the object directly and avoid the local variable?When you're building a list, you do this by repeatedly calling
append
. This is not quite as bad as repeatedly calling+=
on a string (lists are a bit more flexible), but you can simplify this code by using a list comprehension:elif isinstance(template, list): return [generate_json_object(template[0]) for _ in xrange(length)]
Where should this _increments
dictionary be stored? We need a new one for each instantiation we do, but it needs to persist throughout the recursive series of calls. We could pass it around as another function parameter, but if you find yourself doing this, that's a sign that you need to use a class. Again, see the final version at the end for how this can be done.
4. Revised instantiation code
This review grew to be very long, so I'll say up front that you shouldn't take the length of this to heart: your code is not bad, especially if you are new to Python. There's always a lot of things to say about a piece of code of this length, and idiomatic Python has a bunch of features (sets, generators, comprehensions, iterators) that will be unfamiliar to users of some other languages. So take everything I have to say with a pinch of salt (except for item #1, which is really the only big problem here).
Your collections of example data (
_male_first_name
etc.) are stored as tuples, but it would be better to store these as lists. Tuples are fixed-size collections typically used as lightweight representations of records. Lists sets are variable-size collections typically containing similar items.Your example data is very America-centric. Given that the purpose of your module is to produce example data for testing, it will be useful to have a wider variety of names, including names with accented letters, or in different scripts.
It is generally results in source code that is easier to read (with fewer quotation characters and commas) if you produce this kind of data using
split
. For example,_female_first_names = u""" Mary Patricia Linda Barbara Elizabeth Zoé Yến София فاطمة 明子 美玲 ยิ่งลักษณ์ """.split()
The function
_random_item
is already in the Python library asrandom.choice
.The string
"0123456789"
is already in the Python library asstring.digits
.It's not clear why
_random_data
needs to strip all initial@
signs from its argument before trying to look it up. You only call this function in four places: either with a literal argument (for example,_random_data('@LETTER_LOWER')
) where you could just call the corresponding function directly, or in a context where the caller could have easily stripped the@
themselves. It also fails to raise an error if the key is missing (so that mistakes like@NUBMER
are not caught).Your definitions of
data
will look nicer (fewer quotation marks) if you use thedict
constructor:data = dict( NUMBER = lambda: _random_item("0123456789"), LETTER_UPPER = lambda: _random_item(string.uppercase), LETTER_LOWER = lambda: _random_item(string.lowercase), ...
Since most of the values in
data
are of the formlambda: _random_item(...)
, why not special-case them to save boilerplate? You could write something like:def _random_data(key): """ Construct a random data item for `key` and return it. Raise KeyError if `key` is unknown. """ constructor = data[key] if isinstance(constructor, types.FunctionType): return constructor() else: return random.choice(constructor)
Building a string by repeatedly extending it with
+=
is a well-known anti-pattern in Python. This is because extending a string is inefficient in Python: a new string gets allocated and the old string copied across each time. This means that any algorithm that builds a string by repeated extension runs like O(n2). It is nearly always better to generate the items to be assembled into the string and thenjoin
them to produce the result. This technique also avoid fencepost difficulties like spurious extra spaces at the start or end. So your could write:def _lorem_ipsum(): """ Return a random paragraph of placeholder text. """ length = random.randrange(2, len(_lorem) / 2) return ' '.join(random.choice(_lorem) for _ in xrange(length))
generate_json
takes an optional argumentname
which is always ignored. You can omit this.
My remaining comments apply to generate_json_object
.
It is usually better in Python to test if an object belongs to a type
t
by writingisinstance(object, t)
rather thantype(object) is t
. The reason is thatobject
may belong to a subtype oft
(for example, adefaultdict
orOrderedDict
instead of a plaindict
). There is a gotcha here, which is thebool
is a subtype ofint
, so you need to order your tests so that Booleans get tested before integers.The structure of
generate_json_object
involves various branches which assign an object to the variablegenerated
. This is then returned at the end. Why not just return the object directly and avoid the local variable?When you're building a list, you do this by repeatedly calling
append
. This is not quite as bad as repeatedly calling+=
on a string (lists are a bit more flexible), but you can simplify this code by using a list comprehension:elif isinstance(template, list): return [generate_json_object(template[0]) for _ in xrange(length)]
Where should this _increments
dictionary be stored? We need a new one for each instantiation we do, but it needs to persist throughout the recursive series of calls. We could pass it around as another function parameter, but if you find yourself doing this, that's a sign that you need to a class. Again, see the final version at the end for how this can be done.
1. Introduction
This review grew to be very long, so I'll say up front that you shouldn't take the length of this to heart: your code is not bad, especially if you are new to Python. There's always a lot of things to say about a piece of code of this length, and idiomatic Python has a bunch of features (sets, generators, comprehensions, iterators) that will be unfamiliar to users of some other languages. So take everything I have to say with a pinch of salt (except for item #1 under "General comments", which is really the only big problem here).
2. General comments
There's no test suite. I know it's hard for you to write test cases because of the randomness. You probably want to take a hint from Mennon van Slooten and provide a way for to change the random choices to a deterministic sequence of choices.
Your collections of example data (
_male_first_name
etc.) are stored as tuples, but it would be better to store these as lists. Tuples are fixed-size collections typically used as lightweight representations of records. Lists sets are variable-size collections typically containing similar items.Your example data is very America-centric. Given that the purpose of your module is to produce example data for testing, it will be useful to have a wider variety of names, including names with accented letters, or in different scripts.
It is generally results in source code that is easier to read (with fewer quotation characters and commas) if you produce this kind of data using
split
. For example,_female_first_names = u""" Mary Patricia Linda Barbara Elizabeth Zoé Yến София فاطمة 明子 美玲 ยิ่งลักษณ์ """.split()
The function
_random_item
is already in the Python library asrandom.choice
.The string
"0123456789"
is already in the Python library asstring.digits
.It's not clear why
_random_data
needs to strip all initial@
signs from its argument before trying to look it up. You only call this function in four places: either with a literal argument (for example,_random_data('@LETTER_LOWER')
) where you could just call the corresponding function directly, or in a context where the caller could have easily stripped the@
themselves. It also fails to raise an error if the key is missing (so that mistakes like@NUBMER
are not caught).Your definitions of
data
will look nicer (fewer quotation marks) if you use thedict
constructor:data = dict( NUMBER = lambda: _random_item("0123456789"), LETTER_UPPER = lambda: _random_item(string.uppercase), LETTER_LOWER = lambda: _random_item(string.lowercase), ...
Since most of the values in
data
are of the formlambda: _random_item(...)
, why not special-case them to save boilerplate? You could write something like:def _random_data(key): """ Construct a random data item for `key` and return it. Raise KeyError if `key` is unknown. """ constructor = data[key] if isinstance(constructor, types.FunctionType): return constructor() else: return random.choice(constructor)
Building a string by repeatedly extending it with
+=
is a well-known anti-pattern in Python. This is because extending a string is inefficient in Python: a new string gets allocated and the old string copied across each time. This means that any algorithm that builds a string by repeated extension runs like O(n2). It is nearly always better to generate the items to be assembled into the string and thenjoin
them to produce the result. This technique also avoids fencepost errors like spurious extra spaces at the start or end. So your could write:def _lorem_ipsum(): """ Return a random paragraph of placeholder text. """ length = random.randrange(2, len(_lorem) / 2) return ' '.join(random.choice(_lorem) for _ in xrange(length))
generate_json
takes an optional argumentname
which is always ignored. You can omit this.
3. Comments on generate_json_object
- It is usually better in Python to test if an object belongs to a type
t
by writingisinstance(object, t)
rather thantype(object) is t
. The reason is thatobject
may belong to a subtype oft
(for example, adefaultdict
orOrderedDict
instead of a plaindict
).
There is a gotcha here, which is that bool
is a subtype of int
, so you need to order your tests so that Booleans get tested before integers.
The structure of
generate_json_object
involves various branches which assign an object to the variablegenerated
. This is then returned at the end. Why not just return the object directly and avoid the local variable?When you're building a list, you do this by repeatedly calling
append
. This is not quite as bad as repeatedly calling+=
on a string (lists are a bit more flexible), but you can simplify this code by using a list comprehension:elif isinstance(template, list): return [generate_json_object(template[0]) for _ in xrange(length)]
Where should this _increments
dictionary be stored? We need a new one for each instantiation we do, but it needs to persist throughout the recursive series of calls. We could pass it around as another function parameter, but if you find yourself doing this, that's a sign that you need to use a class. Again, see the final version at the end for how this can be done.