10
\$\begingroup\$

I am extending the functionalities of a Python list and I would like to include a method to normalize a vector to the [0, 1] range, by using element-wise operations. I came out with this solution, but find that using two classes does not seem clean. The main motivation for using two classes is that the output of data - min(data) from normalize() returns a Python list (due to how __sub__() was implemented), and that native list does not seem to have __truediv__() implemented.

How can I achieve the normalize() method and avoid the creation of the intermediate _BaseList class? The project I am working on has very constrained memory and I cannot use Numpy.

class _BaseList(list):
 def __init__(self, data):
 super().__init__(data)
 def __sub__(self, value):
 if type(value) in (int, float):
 return [elem - value for elem in self]
 elif type(value) is list and len(value) == len(self):
 return [a - b for a, b in zip(value, self)]
 def __truediv__(self, value):
 if type(value) in (int, float):
 return [elem / value for elem in self]
 elif type(value) is list and len(value) == len(self):
 return [a / b for a, b in zip(value, self)]
class Array(_BaseList):
 def __init__(self, data=None):
 super().__init__(data)
 def normalize(self):
 print(type(self))
 return _BaseList((self - min(self))) / float(max(self) - min(self))
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Dec 24, 2017 at 6:03
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Why do you need to extend list? Are there existing methods that you'll make extensive use of? \$\endgroup\$ Commented Dec 24, 2017 at 8:21
  • 1
    \$\begingroup\$ Also, is it only normalize that you need to add or do you plan on adding more methods? \$\endgroup\$ Commented Dec 24, 2017 at 8:50
  • \$\begingroup\$ Are you re implementing numpy? \$\endgroup\$ Commented Dec 24, 2017 at 10:27

1 Answer 1

13
\$\begingroup\$

You say you need the extra class, "due to how __sub__() was implemented"; because it returns a vanilla list, with a list comprehension. However, note:

  1. If you replace e.g.

    return [elem - value for elem in self]
    

    with:

    return _BaseList(elem - value for elem in self)
    

    __sub__ will return a _BaseList instead; and

  2. If you put those four methods in one class calling normalize would still work anyway, because you do convert to _BaseList after the subtraction.


In fact, three methods, because the __init__ is redundant - if all your subclass method does is call the superclass version, you can just let Python handle that for you.


The Zen of Python states that

Errors should never pass silently.

However, in both __truediv__ and __sub__, if value is not one of the three specified types (or it's a list of the wrong length), they quietly return None. Instead, you should raise TypeError if the user passes a value that can't be handled.


Speaking of specified types, don't write things like:

if type(value) in (int, float)

If you need to check types, use isinstance - this supports inheritance better:

if isinstance(value, (int, float)):

This is particularly important when you're using inheritance yourself - what happens if you try to subtract a _BaseList from another _BaseList? Python supplies abstract base classes that can help make this usage more generic.

Also note that there is another numeric type: complex - as Mathias Ettinger pointed out in the comments you can use Number when you want to cover all three cases.


There's a lot of duplication between the __sub__ and __truediv__ implementations; you could factor this out by extracting to a method that also takes the operator to apply, then use the operators defined in operator.


from collections.abc import Sequence
from operators import sub, trudiv
class Array(list):
 def normalize(self):
 min_ = min(self) # calculate this once
 return (self - min_) / (max(self) - min_)
 def __sub__(self, other):
 return self._process(other, sub)
 def __truediv__(self, other):
 return self._process(other, truediv)
 def _process(self, other, op):
 if isinstance(other, Sequence):
 if len(other) == len(self):
 return Array(op(a, b) for a, b in zip(self, other))
 raise ValueError('cannot operate on a sequence of unequal length')
 return Array(op(a, other) for a in self)

In use:

>>> a = Array((7, 8, 9))
>>> a
[7, 8, 9]
>>> a - 2
[5, 6, 7]
>>> a - [1, 2, 3]
[6, 6, 6]
>>> a.normalize()
[0.0, 0.5, 1.0]
>>> a - 'abc'
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "<stdin>", line 6, in __sub__
 File "<stdin>", line 12, in _process
 File "<stdin>", line 12, in <genexpr>
TypeError: unsupported operand type(s) for -: 'int' and 'str'
>>> a - [1, 2, 3, 4]
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "<stdin>", line 6, in __sub__
 File "<stdin>", line 13, in _process
ValueError: cannot operate on a sequence of unequal length

It might also be worth implementing __repr__, so you can tell more easily when you have a vanilla list and when you have an Array.

answered Dec 24, 2017 at 9:07
\$\endgroup\$
1
  • \$\begingroup\$ Thank you very much for all the suggestions! Just adapted them for this specific constrained context, and they work great. \$\endgroup\$ Commented Dec 25, 2017 at 13:34

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.