I am extending the functionalities of a Python list and I would like to include a method to normalize a vector to the [0, 1]
range, by using element-wise operations. I came out with this solution, but find that using two classes does not seem clean. The main motivation for using two classes is that the output of data - min(data)
from normalize()
returns a Python list (due to how __sub__()
was implemented), and that native list does not seem to have __truediv__()
implemented.
How can I achieve the normalize()
method and avoid the creation of the intermediate _BaseList
class? The project I am working on has very constrained memory and I cannot use Numpy.
class _BaseList(list):
def __init__(self, data):
super().__init__(data)
def __sub__(self, value):
if type(value) in (int, float):
return [elem - value for elem in self]
elif type(value) is list and len(value) == len(self):
return [a - b for a, b in zip(value, self)]
def __truediv__(self, value):
if type(value) in (int, float):
return [elem / value for elem in self]
elif type(value) is list and len(value) == len(self):
return [a / b for a, b in zip(value, self)]
class Array(_BaseList):
def __init__(self, data=None):
super().__init__(data)
def normalize(self):
print(type(self))
return _BaseList((self - min(self))) / float(max(self) - min(self))
1 Answer 1
You say you need the extra class, "due to how __sub__()
was implemented"; because it returns a vanilla list
, with a list comprehension. However, note:
If you replace e.g.
return [elem - value for elem in self]
with:
return _BaseList(elem - value for elem in self)
__sub__
will return a_BaseList
instead; andIf you put those four methods in one class calling
normalize
would still work anyway, because you do convert to_BaseList
after the subtraction.
In fact, three methods, because the __init__
is redundant - if all your subclass method does is call the superclass version, you can just let Python handle that for you.
The Zen of Python states that
Errors should never pass silently.
However, in both __truediv__
and __sub__
, if value
is not one of the three specified types (or it's a list of the wrong length), they quietly return None
. Instead, you should raise TypeError
if the user passes a value that can't be handled.
Speaking of specified types, don't write things like:
if type(value) in (int, float)
If you need to check types, use isinstance
- this supports inheritance better:
if isinstance(value, (int, float)):
This is particularly important when you're using inheritance yourself - what happens if you try to subtract a _BaseList
from another _BaseList
? Python supplies abstract base classes that can help make this usage more generic.
Also note that there is another numeric type: complex
- as Mathias Ettinger pointed out in the comments you can use Number
when you want to cover all three cases.
There's a lot of duplication between the __sub__
and __truediv__
implementations; you could factor this out by extracting to a method that also takes the operator to apply, then use the operators defined in operator
.
from collections.abc import Sequence
from operators import sub, trudiv
class Array(list):
def normalize(self):
min_ = min(self) # calculate this once
return (self - min_) / (max(self) - min_)
def __sub__(self, other):
return self._process(other, sub)
def __truediv__(self, other):
return self._process(other, truediv)
def _process(self, other, op):
if isinstance(other, Sequence):
if len(other) == len(self):
return Array(op(a, b) for a, b in zip(self, other))
raise ValueError('cannot operate on a sequence of unequal length')
return Array(op(a, other) for a in self)
In use:
>>> a = Array((7, 8, 9))
>>> a
[7, 8, 9]
>>> a - 2
[5, 6, 7]
>>> a - [1, 2, 3]
[6, 6, 6]
>>> a.normalize()
[0.0, 0.5, 1.0]
>>> a - 'abc'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in __sub__
File "<stdin>", line 12, in _process
File "<stdin>", line 12, in <genexpr>
TypeError: unsupported operand type(s) for -: 'int' and 'str'
>>> a - [1, 2, 3, 4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in __sub__
File "<stdin>", line 13, in _process
ValueError: cannot operate on a sequence of unequal length
It might also be worth implementing __repr__
, so you can tell more easily when you have a vanilla list
and when you have an Array
.
-
\$\begingroup\$ Thank you very much for all the suggestions! Just adapted them for this specific constrained context, and they work great. \$\endgroup\$tulians– tulians2017年12月25日 13:34:55 +00:00Commented Dec 25, 2017 at 13:34
normalize
that you need to add or do you plan on adding more methods? \$\endgroup\$numpy
? \$\endgroup\$