Python factory method with easy registry

Question 1

My aim is to define a set of classes, each providing methods for comparing a particular type of file. My idea is to use some kind of factory method to instantiate the class based upon a string, which could allow new classes to be added easily. Then it would be simple to loop over a dictionary like:

files = {
 'csv': ('file1.csv', 'file2.csv'),
 'bin': ('file3.bin', 'file4.bin')
}

Here is what I have so far:

# results/__init__.py
class ResultDiffException(Exception):
 pass
class ResultDiff(object):
 """Base class that enables comparison of result files."""
 def __init__(self, path_test, path_ref):
 self.path_t = path_test
 self.path_r = path_ref
 def max(self):
 raise NotImplementedError('abstract method')
 def min(self):
 raise NotImplementedError('abstract method')
 def mean(self):
 raise NotImplementedError('abstract method')
# results/numeric.py
import numpy as np
from results import ResultDiff, ResultDiffException
class NumericArrayDiff(ResultDiff):
 def __init__(self, *args, **kwargs):
 super(NumericArrayDiff, self).__init__(*args, **kwargs)
 self.data_t = self._read_file(self.path_t)
 self.data_r = self._read_file(self.path_r)
 if self.data_t.shape != self.data_r.shape:
 raise ResultDiffException('Inconsistent array shape')
 np.seterr(divide='ignore', invalid='ignore')
 self.diff = (self.data_t - self.data_r) / self.data_r
 both_zero_ind = np.nonzero((self.data_t == 0) & (self.data_r == 0))
 self.diff[both_zero_ind] = 0
 def _read_file(self, path):
 return np.loadtxt(path, ndmin=1)
 def max(self):
 return np.amax(self.diff)
 def min(self):
 return np.amin(self.diff)
 def mean(self):
 return np.mean(self.diff)
class CsvDiff(NumericArrayDiff):
 def __init__(self, *args, **kwargs):
 super(CsvDiff, self).__init__(*args, **kwargs)
 def _read_file(self, path):
 return np.loadtxt(path, delimiter=',', ndmin=1)
class BinaryNumericArrayDiff(NumericArrayDiff):
 def __init__(self, *args, **kwargs):
 super(BinaryNumericArrayDiff, self).__init__(*args, **kwargs)
 def _read_file(self, path):
 return np.fromfile(path)

As you can see, the classes CsvDiff and BinaryNumericArrayDiff have only very minor changes with respect to NumericArrayDiff, which could potentially be refactored using constructor arguments. The problem is that different file types would then require different constructor syntax, which would complicate the factory pattern.

I did also consider providing @classmethods to NumericArrayDiff, which could be put in a dict in order to link to the file types. However, I'm hoping for a more natural way of registering these classes to the factory.

Any advice would be much appreciated.

Question 2

1. Stop writing classes

The title for this section comes from Jack Diederich's PyCon 2012 talk.

A class represents a group of objects with similar behaviour, and an object represents some kind of persistent thing. So when deciding what classes a program is going to need, the first question to ask is, "what kind of persistent things does this program need to represent?"

In this case the program:

knows how to load NumPy arrays from different kinds of file format (CSV and plain text); and
knows how to compute the relative difference between two NumPy arrays (so long as they come from files with the same format).

The only persistent things here are files (represented by Python file objects) and NumPy arrays (represented by numpy.ndarray objects). So there's no need for any more classes.

2. Other review points

The code calls numpy.seterr to suppress the warning:
```
RuntimeWarning: invalid value encountered in true_divide
```
but it fails to restore the original error state, whatever it was. This might be an unpleasant surprise for the caller. It would be better to use the numpy.errstate context manager to ensure that the original error state is restored.
When dispatching to NumPy functions, it usually unncessary to check shapes for compatibility and raise your own error. Instead, just pass the arrays to NumPy. If they can't be combined, then NumPy will raise:
```
ValueError: operands could not be broadcast together ...
```

3. Revised code

Instead of classes, write a function!

import numpy as np
def relative_difference(t, r):
 """Return the relative difference between arrays t and r, that is:
 0 where t == 0 and r == 0
 (t - r) / r otherwise
 """
 t, r = np.asarray(t), np.asarray(r)
 with np.errstate(divide='ignore', invalid='ignore'):
 return np.where((t == 0) & (r == 0), 0, (t - r) / r)

Note the following advantages over the original code:

It's much shorter, and so there's much less code to maintain.
It can find the difference between arrays that come from files with different formats:
```
relative_difference(np.loadtxt(path1), np.fromfile(path2))
```
It can find the difference between arrays that don't come from files at all:
```
relative_difference(np.random.randint(0, 10, (10,)), np.arange(1, 11))
```

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2015-05-21 14:00:53Z

1. Stop writing classes

The title for this section comes from Jack Diederich's PyCon 2012 talk.

A class represents a group of objects with similar behaviour, and an object represents some kind of persistent thing. So when deciding what classes a program is going to need, the first question to ask is, "what kind of persistent things does this program need to represent?"

In this case the program:

knows how to load NumPy arrays from different kinds of file format (CSV and plain text); and
knows how to compute the relative difference between two NumPy arrays (so long as they come from files with the same format).

The only persistent things here are files (represented by Python file objects) and NumPy arrays (represented by numpy.ndarray objects). So there's no need for any more classes.

2. Other review points

The code calls numpy.seterr to suppress the warning:
```
RuntimeWarning: invalid value encountered in true_divide
```
but it fails to restore the original error state, whatever it was. This might be an unpleasant surprise for the caller. It would be better to use the numpy.errstate context manager to ensure that the original error state is restored.
When dispatching to NumPy functions, it usually unncessary to check shapes for compatibility and raise your own error. Instead, just pass the arrays to NumPy. If they can't be combined, then NumPy will raise:
```
ValueError: operands could not be broadcast together ...
```

3. Revised code

Instead of classes, write a function!

import numpy as np
def relative_difference(t, r):
 """Return the relative difference between arrays t and r, that is:
 0 where t == 0 and r == 0
 (t - r) / r otherwise
 """
 t, r = np.asarray(t), np.asarray(r)
 with np.errstate(divide='ignore', invalid='ignore'):
 return np.where((t == 0) & (r == 0), 0, (t - r) / r)

Note the following advantages over the original code:

It's much shorter, and so there's much less code to maintain.
It can find the difference between arrays that come from files with different formats:
```
relative_difference(np.loadtxt(path1), np.fromfile(path2))
```
It can find the difference between arrays that don't come from files at all:
```
relative_difference(np.random.randint(0, 10, (10,)), np.arange(1, 11))
```

Stack Exchange Network

Python factory method with easy registry

1 Answer 1

1. Stop writing classes

2. Other review points

3. Revised code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python factory method with easy registry

1 Answer 1

1. Stop writing classes

2. Other review points

3. Revised code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions