I'm trying to process numerical input arguments for a function. Users can mix ints, floats, and "array-like" things of ints and float, as long as they are all length=1 or the same length>1.
Now I convert them as follows:
convert to float convert to np.ndarray; dtype==float
---------------- -----------------------------------
float
int
np.float64
range object resulting in len == 1 range object resulting in len > 1
len==1 list or tuple of int or float len > 1 list or tuple of ints & floats
len==1 np.ndarray of dtype np.int or np.float len > 1 np.ndarray of dtype np.int or np.float
then test all resulting arrays to make sure they are the same length. If so, I return a list containing floats and arrays. If not, return None
.
I want to avoid up-conversion of booleans and small byte-length ints.
The script below appears to do what I want by brute force testing, but I wonder if there is a better way?
Desired behaviors:
fixem(some_bad_things)
returnsNone
fixem(all_good_things)
returns[42.0, 3.14, 3.141592653589793, 2.718281828459045, 3.0, 42.0, 3.0, 42.0, array([1. , 2.3, 2. ]), array([3.14, 1. , 4. ]), array([1., 2., 2.]), array([3., 1., 4.]), array([0., 1., 2.]), array([0, 1, 2])]
and
sum(fixem(all_good_things))
returnsarray([149.13987448 149.29987448 156.99987448])
def fixit(x):
result = None
if type(x) in (int, float, np.float64):
result = float(x)
elif type(x) == range:
y = list(x)
if all([type(q) in (int, float) for q in y]):
if len(y) == 1:
result = float(y[0])
elif len(y) > 1:
result = np.array(y)
elif type(x) in (tuple, list) and all([type(q) in (int, float) for q in x]):
y = np.array(x)
if y.dtype in (int, float):
if len(y) == 1:
result = float(y[0])
elif len(y) > 1:
result = y.astype(float)
elif (type(x) == np.ndarray and len(x.shape) == 1 and
x.dtype in (np.int, np.float)):
if len(x) == 1:
result = float(x[0])
elif len(x) > 1:
result = x.astype(float)
return result
def fixem(things):
final = None
results = [fixit(thing) for thing in things]
floats = [r for r in results if type(r) is float]
arrays = [r for r in results if type(r) is np.ndarray]
others = [r for r in results if type(r) not in (float, np.ndarray)]
if len(others) == 0:
if len(arrays) == 0 or len(set([len(a) for a in arrays]))==1: # none or all same length
final = floats + arrays
return final
import numpy as np
some_bad_things = ('123', False, None, True, 42, 3.14, np.pi, np.exp(1),
[1, 2.3, 2], (3.14, 1, 4), [1, 2, 2], (3, 1, 4),
(3,), [42], (3.,), [42.], np.array([True, False]),
np.array([False]), np.array(False), np.array('xyz'),
np.array(42), np.array(42.), np.arange(3.), range(3))
all_good_things = (42, 3.14, np.pi, np.exp(1), [1, 2.3, 2], (3.14, 1, 4),
[1, 2, 2], (3, 1, 4), (3,), [42], (3.,), [42.],
np.arange(3.), range(3))
for i, things in enumerate((some_bad_things, all_good_things)):
print(i, fixem(things))
print(sum(fixem(all_good_things))) # confirm
-
\$\begingroup\$ I'm curious, where does this problem come from? \$\endgroup\$Georgy– Georgy2019年11月29日 13:26:29 +00:00Commented Nov 29, 2019 at 13:26
-
\$\begingroup\$ @Georgy I don't understand exactly how to answer your question. I'm writing a script for others to use that will do a calculation based on numerical input. If they are all floats, it will perform a calculation and return one object; if one or more are array-like, it returns a collection of them. \$\endgroup\$uhoh– uhoh2019年11月29日 13:45:11 +00:00Commented Nov 29, 2019 at 13:45
1 Answer 1
I suggest using
isinstance
instead oftype
to check the types of variables. You can read about it in details here: What are the differences between type() and isinstance()? So, for example, instead of writing:if type(x) in (int, float, np.float64):
you would write:
if isinstance(x, (int, float)):
You can check that it works for
np.exp(1)
which is of typenp.float64
.When the
x
is of typerange
the following check is redundant:if all([type(q) in (int, float) for q in y])
as the elements of
y
will be always integers. Also, there is no need to convertrange
tolist
. The following will also work:result = float(x[0]) if len(x) == 1 else np.array(x)
To check if a list is empty in Python we usually write:
if not others:
instead of:
if len(others) == 0:
Imports should be at the top of the script. Move the
import numpy as np
line there.In the future, when you have a function that can accept variables of different types and its behavior depends on which type it gets, you could try using
singledispatch
. I could come up with the following implementation:from collections.abc import Iterable from functools import singledispatch from numbers import Real import numpy as np @singledispatch def fixit(x): return None @fixit.register def _(x: Real): return float(x) @fixit.register def _(x: Iterable): y = np.array(x) if y.dtype in (np.int, np.float) and len(y.shape) == 1: return float(y[0]) if len(y) == 1 else y.astype(float) else: return None
I didn't check it thoroughly but for your test cases it works. (but looks a bit ugly)
A better way to solve your problem would be to convert immediately all the values to NumPy arrays of at least one dimension. We would require a
np.atleast_1d
function for that:def to_normalized_data(values): arrays = list(map(np.atleast_1d, values)) sizes = set(map(np.size, arrays)) has_bad_types = any(array.dtype not in (np.int32, np.float64) for array in arrays) if len(sizes) > 2 or has_bad_types: return None max_size = max(sizes) singletons = [float(array[0]) for array in arrays if array.size == 1] iterables = [array.astype(float) for array in arrays if array.size == max_size] return singletons + iterables
>>> to_normalized_data(all_good_things) [42.0, 3.14, 3.141592653589793, 2.718281828459045, 3.0, 42.0, 3.0, 42.0, array([1. , 2.3, 2. ]), array([3.14, 1. , 4. ]), array([1., 2., 2.]), array([3., 1., 4.]), array([0., 1., 2.]), array([0., 1., 2.])] >>> sum(to_normalized_data(all_good_things)) array([149.13987448, 149.29987448, 156.99987448]) >>> print(to_normalized_data(some_bad_things)) None
Answering your comments:
For some reason my anaconda's numpy (1.17.3) returns int64 rather than int32 as you have in item 6
Looks like this behavior is OS-specific. Probably a better way to check the types of the obtained arrays would be by using
np.can_cast
. So, instead of writing:has_bad_types = any(array.dtype not in (np.int32, np.float64) for array in arrays)
we could write:
has_bad_types = not all(np.can_cast(array.dtype, np.float64) for array in arrays)
Item #6 doesn't reject two different lengths if no singletons are present. With
([1, 2, 3], [1, 2, 3, 4])
as input, the output is[array([1., 2., 3., 4.])]
and[1, 2, 3]
just falls through the cracks and disappearsWelp, I missed this case... We can add it back as:
if len(sizes) > 2 or 1 not in sizes or has_bad_types:
also my original script tested if
len(x.shape) == 1
in order to reject ndm> 1 arrays which item #6 doesn't, but that can be easily added back with something like testing forset(map(np.ndim, arrays)) == set((1,))
. This is important becausenp.size
won't distinguish between a length=4 1D array and a 2x2 array.Yep, that's right. Taking all the above into account, the final code could look like this:
def to_normalized_data(values): arrays = list(map(np.atleast_1d, values)) sizes = set(map(np.size, arrays)) have_bad_types = not all(np.can_cast(array.dtype, np.float64) for array in arrays) have_several_dimensions = set(map(np.ndim, arrays)) > {1} if len(sizes) > 2 or 1 not in sizes or have_bad_types or have_several_dimensions: return None max_size = max(sizes) singletons = [float(array[0]) for array in arrays if array.size == 1] iterables = [array.astype(float) for array in arrays if array.size == max_size] return singletons + iterables
-
\$\begingroup\$ fyi I've just asked Managing user defined and default values for instance attributes \$\endgroup\$uhoh– uhoh2019年12月03日 09:19:30 +00:00Commented Dec 3, 2019 at 9:19
-
1\$\begingroup\$ Bingo! Excellent, thank you for your time and effort! \$\endgroup\$uhoh– uhoh2019年12月03日 11:14:11 +00:00Commented Dec 3, 2019 at 11:14