I am trying to make an example of Strategy design pattern in Python 3.x. Below is my code. I feel like this can be improved but I don't know how. The run
methods seem to take more responsibilities for testing and executing. So does __init__
function. The class takes an external function to create an object but also:
- checks if input arguments for the function
process
are None - loads an object if exists to avoid re-calculations
- checks if user wants to create an object regardless conditions
- tests if an object to return is None
- returns an object.
I am wondering how this can be improved. This is the code:
class Strategy(object):
def __init__(self, filename, forcecreate=False):
if (filename is None) or (not isinstance(filename, str)) or (filename == ""):
raise ValueError("File name is not set")
self.filename = filename
self.forcecreate = forcecreate
self.t = None
self.output = None
def run(self, process, *inputs):
if not self.forcecreate:
self.output = self.__pre()
if not self.output is None:
return self.output
# Check if objects provided are defined
if len(inputs) > 0:
if any([i is None for i in inputs]):
raise ValueError("One of the objects is not set")
self.output = process(*inputs)
self.__post()
return self.output
def __pre(self):
if (Path(self.filename).is_file()):
print("File exists")
with open(self.filename, 'rb') as f_handler:
return pickle.load(f_handler)
return None
def __post(self):
with open(self.filename, 'wb') as f_handler:
pickle.dump(self.output, f_handler)
I have a set of functions which process data and return objects which contain some structured data. I also save objects in order to avoid re-calculations. If a file exists, I can just load an object and return it. The number of functions is getting higher and I end up with copy-pasting. I thought that would slightly improve my code.
1 Answer 1
The strategy design pattern is useful mostly when you want to parametrize an algorithm with different behaviours. That implies you would have more than one strategy, which is not the case here. As such, calling this a strategy might be confusing.
Instead, it might be better to view your class as a decorator that can cache the result of a computation in a file. However, that would raise two questions:
Why is the cached data independent from the
process
that can produce the data, and why is the cached data independent of the input? For example, this behaviour might not be expected:def multiply_by_2(*xs): return [2 * x for x in xs] def add_3(*xs): return [x + 3 for x in xs] cache.run(multiply_by_2, 1, 2, 3) #=> [2, 4, 6] cache.run(multiply_by_2, 3, 4, 5) #=> [2, 4, 6] cache.run(add_3, 1, 2, 3) #=> [2, 4, 6]
That the function may be changed can be prevented by providing the process to the constructor, i.e. associating a cache file with a function before your cache is queried. A different function would then get a different cache.
That the data may vary between invocations is difficult to prevent unless you also store the input data in the file. You may be interested in looking at the
functools.lru_cache()
decorator that can cache the last n calls in memory, and looks up the correct cache entry by argument values.If the cache object decorates an existing function (and effectively just offers a single public method
run()
, then it may be convenient to let the cache also offer a function-like interface: either by implementing a__call__()
method, or by actually just using nested functions. With the latter approach, you might implement a decorator like:def cache(filename, forcecreate=False): def inner(process): def cached(*input): ... # basically your run() method return cached return inner
Used like:
@cache("foo.data") def cache_me(a, b, c): ... result = cache_me(1, 2, 3)
or:
cached_multiply_by_2 = cache("multiply by 2.data")(multiply_by_2) result = cached_multiply_by_2(1, 2, 3)
It is also possible to do this with an object with a
__call__()
method. Such an object is actually preferable for debugging since you can inspect the properties of an object.
Other notes:
The
self.t
field is never read. It can be safely removed.While
__post()
reads fromself.output
, the corresponding__pre()
method does not write to that variable, returning a result instead. It would be better if either both methods directly access theself.output
state or both methods are "pure" so that__post()
would take an parameter.The names of pre and post are slightly confusing. I would associate them with asserting preconditions and postconditions. Instead, they
load()
andstore()
cached data.Your cache format is unable to cache a
None
value. When a None is loaded, the process is re-run again. Instead, load/store the data within a container, e.g. a single-element list. Then an empty list indicates the absence of a value, whereas a single-element list (even if that element is None) indicates the presence of a value.if not self.output is None:
would usually be written asif self.output is not None:
.The outer parentheses in
if (Path(self.filename).is_file()):
are unnecessary.Output like
print("File exists")
deep inside your code is not a good idea. If you want to do logging, use a dedicated logging function that is switched off by default. If you output logging, prefer writing tosys.stderr
rather thansys.stdout
.any([i is None for i in inputs])
creates a list unnecessarily. A generator expression is sufficient:any(i is None for i in inputs)
.It is weird that your class checks the arguments for None-ness. If the wrapped
process()
cannot handle nones it should check that itself. Or you could define a separate decorator that you could apply independently:def all_not_none(function): @functools.wraps(function) def inner(*args): if any(x is None for x in args): raise ValueError(...) return function(*args) return inner
(Using the
functools.wraps
decorator fixes docstrings and names of the returned function object to imitate the original function more closely.)
-
\$\begingroup\$ First, thanks for the list of "other notes". It's really helpful. Your overall answer covers really advanced topics. I did not think of cache in this class. I did not realised it uses it. My first aim was to read data from disk if available or calculate and store. I read through your answer but I feel like I need to read it again. Also, I need to get more familiar with
functools
which I was not aware of before. I should perhaps read a good book to understand other more advanced techniques in Python. Thanks. \$\endgroup\$Celdor– Celdor2017年11月14日 20:07:04 +00:00Commented Nov 14, 2017 at 20:07