How to reduce responsibility of the main function in strategy pattern (python)

Question 1

I am trying to make an example of Strategy design pattern in Python 3.x. Below is my code. I feel like this can be improved but I don't know how. The run methods seem to take more responsibilities for testing and executing. So does __init__ function. The class takes an external function to create an object but also:

checks if input arguments for the function process are None
loads an object if exists to avoid re-calculations
checks if user wants to create an object regardless conditions
tests if an object to return is None
returns an object.

I am wondering how this can be improved. This is the code:

class Strategy(object):
 def __init__(self, filename, forcecreate=False):
 if (filename is None) or (not isinstance(filename, str)) or (filename == ""):
 raise ValueError("File name is not set")
 self.filename = filename
 self.forcecreate = forcecreate
 self.t = None
 self.output = None
 def run(self, process, *inputs):
 if not self.forcecreate:
 self.output = self.__pre()
 if not self.output is None:
 return self.output
 # Check if objects provided are defined
 if len(inputs) > 0:
 if any([i is None for i in inputs]):
 raise ValueError("One of the objects is not set")
 self.output = process(*inputs)
 self.__post()
 return self.output
 def __pre(self):
 if (Path(self.filename).is_file()):
 print("File exists")
 with open(self.filename, 'rb') as f_handler:
 return pickle.load(f_handler)
 return None
 def __post(self):
 with open(self.filename, 'wb') as f_handler:
 pickle.dump(self.output, f_handler)

I have a set of functions which process data and return objects which contain some structured data. I also save objects in order to avoid re-calculations. If a file exists, I can just load an object and return it. The number of functions is getting higher and I end up with copy-pasting. I thought that would slightly improve my code.

Question 2

The strategy design pattern is useful mostly when you want to parametrize an algorithm with different behaviours. That implies you would have more than one strategy, which is not the case here. As such, calling this a strategy might be confusing.

Instead, it might be better to view your class as a decorator that can cache the result of a computation in a file. However, that would raise two questions:

Why is the cached data independent from the process that can produce the data, and why is the cached data independent of the input? For example, this behaviour might not be expected:
```
def multiply_by_2(*xs):
 return [2 * x for x in xs]
def add_3(*xs):
 return [x + 3 for x in xs]
cache.run(multiply_by_2, 1, 2, 3) #=> [2, 4, 6]
cache.run(multiply_by_2, 3, 4, 5) #=> [2, 4, 6]
cache.run(add_3, 1, 2, 3) #=> [2, 4, 6]
```
That the function may be changed can be prevented by providing the process to the constructor, i.e. associating a cache file with a function before your cache is queried. A different function would then get a different cache.

That the data may vary between invocations is difficult to prevent unless you also store the input data in the file. You may be interested in looking at the functools.lru_cache() decorator that can cache the last n calls in memory, and looks up the correct cache entry by argument values.
If the cache object decorates an existing function (and effectively just offers a single public method run(), then it may be convenient to let the cache also offer a function-like interface: either by implementing a __call__() method, or by actually just using nested functions. With the latter approach, you might implement a decorator like:
```
def cache(filename, forcecreate=False):
 def inner(process):
 def cached(*input):
 ... # basically your run() method
 return cached
 return inner
```
Used like:
```
@cache("foo.data")
def cache_me(a, b, c):
 ...
result = cache_me(1, 2, 3)
```
or:
```
cached_multiply_by_2 = cache("multiply by 2.data")(multiply_by_2)
result = cached_multiply_by_2(1, 2, 3)
```
It is also possible to do this with an object with a __call__() method. Such an object is actually preferable for debugging since you can inspect the properties of an object.

Other notes:

The self.t field is never read. It can be safely removed.
While __post() reads from self.output, the corresponding __pre() method does not write to that variable, returning a result instead. It would be better if either both methods directly access the self.output state or both methods are "pure" so that __post() would take an parameter.
The names of pre and post are slightly confusing. I would associate them with asserting preconditions and postconditions. Instead, they load() and store() cached data.
Your cache format is unable to cache a None value. When a None is loaded, the process is re-run again. Instead, load/store the data within a container, e.g. a single-element list. Then an empty list indicates the absence of a value, whereas a single-element list (even if that element is None) indicates the presence of a value.
if not self.output is None: would usually be written as if self.output is not None:.
The outer parentheses in if (Path(self.filename).is_file()): are unnecessary.
Output like print("File exists") deep inside your code is not a good idea. If you want to do logging, use a dedicated logging function that is switched off by default. If you output logging, prefer writing to sys.stderr rather than sys.stdout.
any([i is None for i in inputs]) creates a list unnecessarily. A generator expression is sufficient: any(i is None for i in inputs).
It is weird that your class checks the arguments for None-ness. If the wrapped process() cannot handle nones it should check that itself. Or you could define a separate decorator that you could apply independently:
```
def all_not_none(function):
 @functools.wraps(function)
 def inner(*args):
 if any(x is None for x in args):
 raise ValueError(...)
 return function(*args)
 return inner
```
(Using the functools.wraps decorator fixes docstrings and names of the returned function object to imitate the original function more closely.)

Question 3

First, thanks for the list of "other notes". It's really helpful. Your overall answer covers really advanced topics. I did not think of cache in this class. I did not realised it uses it. My first aim was to read data from disk if available or calculate and store. I read through your answer but I feel like I need to read it again. Also, I need to get more familiar with functools which I was not aware of before. I should perhaps read a good book to understand other more advanced techniques in Python. Thanks.

amon amonamon 12.7k37 silver badges67 bronze badges · Answer 1 · 2017-11-12 12:58:12Z

The strategy design pattern is useful mostly when you want to parametrize an algorithm with different behaviours. That implies you would have more than one strategy, which is not the case here. As such, calling this a strategy might be confusing.

Instead, it might be better to view your class as a decorator that can cache the result of a computation in a file. However, that would raise two questions:

Why is the cached data independent from the process that can produce the data, and why is the cached data independent of the input? For example, this behaviour might not be expected:
```
def multiply_by_2(*xs):
 return [2 * x for x in xs]
def add_3(*xs):
 return [x + 3 for x in xs]
cache.run(multiply_by_2, 1, 2, 3) #=> [2, 4, 6]
cache.run(multiply_by_2, 3, 4, 5) #=> [2, 4, 6]
cache.run(add_3, 1, 2, 3) #=> [2, 4, 6]
```
That the function may be changed can be prevented by providing the process to the constructor, i.e. associating a cache file with a function before your cache is queried. A different function would then get a different cache.

That the data may vary between invocations is difficult to prevent unless you also store the input data in the file. You may be interested in looking at the functools.lru_cache() decorator that can cache the last n calls in memory, and looks up the correct cache entry by argument values.
If the cache object decorates an existing function (and effectively just offers a single public method run(), then it may be convenient to let the cache also offer a function-like interface: either by implementing a __call__() method, or by actually just using nested functions. With the latter approach, you might implement a decorator like:
```
def cache(filename, forcecreate=False):
 def inner(process):
 def cached(*input):
 ... # basically your run() method
 return cached
 return inner
```
Used like:
```
@cache("foo.data")
def cache_me(a, b, c):
 ...
result = cache_me(1, 2, 3)
```
or:
```
cached_multiply_by_2 = cache("multiply by 2.data")(multiply_by_2)
result = cached_multiply_by_2(1, 2, 3)
```
It is also possible to do this with an object with a __call__() method. Such an object is actually preferable for debugging since you can inspect the properties of an object.

Other notes:

The self.t field is never read. It can be safely removed.
While __post() reads from self.output, the corresponding __pre() method does not write to that variable, returning a result instead. It would be better if either both methods directly access the self.output state or both methods are "pure" so that __post() would take an parameter.
The names of pre and post are slightly confusing. I would associate them with asserting preconditions and postconditions. Instead, they load() and store() cached data.
Your cache format is unable to cache a None value. When a None is loaded, the process is re-run again. Instead, load/store the data within a container, e.g. a single-element list. Then an empty list indicates the absence of a value, whereas a single-element list (even if that element is None) indicates the presence of a value.
if not self.output is None: would usually be written as if self.output is not None:.
The outer parentheses in if (Path(self.filename).is_file()): are unnecessary.
Output like print("File exists") deep inside your code is not a good idea. If you want to do logging, use a dedicated logging function that is switched off by default. If you output logging, prefer writing to sys.stderr rather than sys.stdout.
any([i is None for i in inputs]) creates a list unnecessarily. A generator expression is sufficient: any(i is None for i in inputs).
It is weird that your class checks the arguments for None-ness. If the wrapped process() cannot handle nones it should check that itself. Or you could define a separate decorator that you could apply independently:
```
def all_not_none(function):
 @functools.wraps(function)
 def inner(*args):
 if any(x is None for x in args):
 raise ValueError(...)
 return function(*args)
 return inner
```
(Using the functools.wraps decorator fixes docstrings and names of the returned function object to imitate the original function more closely.)

First, thanks for the list of "other notes". It's really helpful. Your overall answer covers really advanced topics. I did not think of cache in this class. I did not realised it uses it. My first aim was to read data from disk if available or calculate and store. I read through your answer but I feel like I need to read it again. Also, I need to get more familiar with functools which I was not aware of before. I should perhaps read a good book to understand other more advanced techniques in Python. Thanks.

Stack Exchange Network

How to reduce responsibility of the main function in strategy pattern (python)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to reduce responsibility of the main function in strategy pattern (python)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions