process numerical input arguments of mixed ints, floats, and "array-like" things of same length>1

Question 1

I'm trying to process numerical input arguments for a function. Users can mix ints, floats, and "array-like" things of ints and float, as long as they are all length=1 or the same length>1.

Now I convert them as follows:

convert to float convert to np.ndarray; dtype==float
---------------- -----------------------------------
float 
int
np.float64
range object resulting in len == 1 range object resulting in len > 1
len==1 list or tuple of int or float len > 1 list or tuple of ints & floats
len==1 np.ndarray of dtype np.int or np.float len > 1 np.ndarray of dtype np.int or np.float

then test all resulting arrays to make sure they are the same length. If so, I return a list containing floats and arrays. If not, return None.

I want to avoid up-conversion of booleans and small byte-length ints.

The script below appears to do what I want by brute force testing, but I wonder if there is a better way?

Desired behaviors:

fixem(some_bad_things) returns None
fixem(all_good_things) returns [42.0, 3.14, 3.141592653589793, 2.718281828459045, 3.0, 42.0, 3.0, 42.0, array([1. , 2.3, 2. ]), array([3.14, 1. , 4. ]), array([1., 2., 2.]), array([3., 1., 4.]), array([0., 1., 2.]), array([0, 1, 2])]
and sum(fixem(all_good_things)) returns array([149.13987448 149.29987448 156.99987448])

def fixit(x):
 result = None
 if type(x) in (int, float, np.float64):
 result = float(x)
 elif type(x) == range:
 y = list(x)
 if all([type(q) in (int, float) for q in y]):
 if len(y) == 1:
 result = float(y[0])
 elif len(y) > 1:
 result = np.array(y)
 elif type(x) in (tuple, list) and all([type(q) in (int, float) for q in x]):
 y = np.array(x)
 if y.dtype in (int, float):
 if len(y) == 1:
 result = float(y[0])
 elif len(y) > 1:
 result = y.astype(float)
 elif (type(x) == np.ndarray and len(x.shape) == 1 and
 x.dtype in (np.int, np.float)):
 if len(x) == 1:
 result = float(x[0])
 elif len(x) > 1:
 result = x.astype(float)
 return result
def fixem(things):
 final = None
 results = [fixit(thing) for thing in things]
 floats = [r for r in results if type(r) is float]
 arrays = [r for r in results if type(r) is np.ndarray]
 others = [r for r in results if type(r) not in (float, np.ndarray)]
 if len(others) == 0:
 if len(arrays) == 0 or len(set([len(a) for a in arrays]))==1: # none or all same length
 final = floats + arrays
 return final
import numpy as np
some_bad_things = ('123', False, None, True, 42, 3.14, np.pi, np.exp(1),
 [1, 2.3, 2], (3.14, 1, 4), [1, 2, 2], (3, 1, 4),
 (3,), [42], (3.,), [42.], np.array([True, False]),
 np.array([False]), np.array(False), np.array('xyz'),
 np.array(42), np.array(42.), np.arange(3.), range(3))
all_good_things = (42, 3.14, np.pi, np.exp(1), [1, 2.3, 2], (3.14, 1, 4),
 [1, 2, 2], (3, 1, 4), (3,), [42], (3.,), [42.],
 np.arange(3.), range(3))
for i, things in enumerate((some_bad_things, all_good_things)):
 print(i, fixem(things))
print(sum(fixem(all_good_things))) # confirm

Question 2

I'm curious, where does this problem come from?

Question 3

@Georgy I don't understand exactly how to answer your question. I'm writing a script for others to use that will do a calculation based on numerical input. If they are all floats, it will perform a calculation and return one object; if one or more are array-like, it returns a collection of them.

Question 4

I suggest using isinstance instead of type to check the types of variables. You can read about it in details here: What are the differences between type() and isinstance()? So, for example, instead of writing:
```
if type(x) in (int, float, np.float64):
```
you would write:
```
if isinstance(x, (int, float)):
```
You can check that it works for np.exp(1) which is of type np.float64.
When the x is of type range the following check is redundant:
```
if all([type(q) in (int, float) for q in y])
```
as the elements of y will be always integers. Also, there is no need to convert range to list. The following will also work:
```
result = float(x[0]) if len(x) == 1 else np.array(x)
```
To check if a list is empty in Python we usually write:
```
if not others:
```
instead of:
```
if len(others) == 0:
```
Imports should be at the top of the script. Move the import numpy as np line there.

In the future, when you have a function that can accept variables of different types and its behavior depends on which type it gets, you could try using singledispatch. I could come up with the following implementation:

from collections.abc import Iterable
from functools import singledispatch
from numbers import Real
import numpy as np
@singledispatch
def fixit(x):
 return None
@fixit.register
def _(x: Real):
 return float(x)
@fixit.register
def _(x: Iterable):
 y = np.array(x)
 if y.dtype in (np.int, np.float) and len(y.shape) == 1:
 return float(y[0]) if len(y) == 1 else y.astype(float)
 else:
 return None

I didn't check it thoroughly but for your test cases it works. (but looks a bit ugly)

A better way to solve your problem would be to convert immediately all the values to NumPy arrays of at least one dimension. We would require a np.atleast_1d function for that:

def to_normalized_data(values):
 arrays = list(map(np.atleast_1d, values))
 sizes = set(map(np.size, arrays))
 has_bad_types = any(array.dtype not in (np.int32, np.float64) for array in arrays)
 if len(sizes) > 2 or has_bad_types:
 return None
 max_size = max(sizes)
 singletons = [float(array[0]) for array in arrays if array.size == 1]
 iterables = [array.astype(float) for array in arrays if array.size == max_size]
 return singletons + iterables

>>> to_normalized_data(all_good_things)
[42.0,
 3.14,
 3.141592653589793,
 2.718281828459045,
 3.0,
 42.0,
 3.0,
 42.0,
 array([1. , 2.3, 2. ]),
 array([3.14, 1. , 4. ]),
 array([1., 2., 2.]),
 array([3., 1., 4.]),
 array([0., 1., 2.]),
 array([0., 1., 2.])]
>>> sum(to_normalized_data(all_good_things))
array([149.13987448, 149.29987448, 156.99987448])
>>> print(to_normalized_data(some_bad_things))
None

Answering your comments:

For some reason my anaconda's numpy (1.17.3) returns int64 rather than int32 as you have in item 6

Looks like this behavior is OS-specific. Probably a better way to check the types of the obtained arrays would be by using np.can_cast. So, instead of writing:
```
has_bad_types = any(array.dtype not in (np.int32, np.float64) for array in arrays)
```
we could write:
```
has_bad_types = not all(np.can_cast(array.dtype, np.float64) for array in arrays)
```
Item #6 doesn't reject two different lengths if no singletons are present. With ([1, 2, 3], [1, 2, 3, 4]) as input, the output is [array([1., 2., 3., 4.])] and [1, 2, 3] just falls through the cracks and disappears

Welp, I missed this case... We can add it back as:
```
if len(sizes) > 2 or 1 not in sizes or has_bad_types:
```

also my original script tested if len(x.shape) == 1 in order to reject ndm> 1 arrays which item #6 doesn't, but that can be easily added back with something like testing for set(map(np.ndim, arrays)) == set((1,)). This is important because np.size won't distinguish between a length=4 1D array and a 2x2 array.

Yep, that's right. Taking all the above into account, the final code could look like this:

def to_normalized_data(values):
 arrays = list(map(np.atleast_1d, values))
 sizes = set(map(np.size, arrays))
 have_bad_types = not all(np.can_cast(array.dtype, np.float64) for array in arrays)
 have_several_dimensions = set(map(np.ndim, arrays)) > {1}
 if len(sizes) > 2 or 1 not in sizes or have_bad_types or have_several_dimensions:
 return None
 max_size = max(sizes)
 singletons = [float(array[0]) for array in arrays if array.size == 1]
 iterables = [array.astype(float) for array in arrays if array.size == max_size]
 return singletons + iterables

Question 5

fyi I've just asked Managing user defined and default values for instance attributes

Question 6

Bingo! Excellent, thank you for your time and effort!

Georgy Georgy 1,9972 gold badges15 silver badges27 bronze badges · Accepted Answer · 2019-11-29 15:56:54Z

I suggest using isinstance instead of type to check the types of variables. You can read about it in details here: What are the differences between type() and isinstance()? So, for example, instead of writing:
```
if type(x) in (int, float, np.float64):
```
you would write:
```
if isinstance(x, (int, float)):
```
You can check that it works for np.exp(1) which is of type np.float64.
When the x is of type range the following check is redundant:
```
if all([type(q) in (int, float) for q in y])
```
as the elements of y will be always integers. Also, there is no need to convert range to list. The following will also work:
```
result = float(x[0]) if len(x) == 1 else np.array(x)
```
To check if a list is empty in Python we usually write:
```
if not others:
```
instead of:
```
if len(others) == 0:
```
Imports should be at the top of the script. Move the import numpy as np line there.

In the future, when you have a function that can accept variables of different types and its behavior depends on which type it gets, you could try using singledispatch. I could come up with the following implementation:

from collections.abc import Iterable
from functools import singledispatch
from numbers import Real
import numpy as np
@singledispatch
def fixit(x):
 return None
@fixit.register
def _(x: Real):
 return float(x)
@fixit.register
def _(x: Iterable):
 y = np.array(x)
 if y.dtype in (np.int, np.float) and len(y.shape) == 1:
 return float(y[0]) if len(y) == 1 else y.astype(float)
 else:
 return None

I didn't check it thoroughly but for your test cases it works. (but looks a bit ugly)

A better way to solve your problem would be to convert immediately all the values to NumPy arrays of at least one dimension. We would require a np.atleast_1d function for that:

def to_normalized_data(values):
 arrays = list(map(np.atleast_1d, values))
 sizes = set(map(np.size, arrays))
 has_bad_types = any(array.dtype not in (np.int32, np.float64) for array in arrays)
 if len(sizes) > 2 or has_bad_types:
 return None
 max_size = max(sizes)
 singletons = [float(array[0]) for array in arrays if array.size == 1]
 iterables = [array.astype(float) for array in arrays if array.size == max_size]
 return singletons + iterables

>>> to_normalized_data(all_good_things)
[42.0,
 3.14,
 3.141592653589793,
 2.718281828459045,
 3.0,
 42.0,
 3.0,
 42.0,
 array([1. , 2.3, 2. ]),
 array([3.14, 1. , 4. ]),
 array([1., 2., 2.]),
 array([3., 1., 4.]),
 array([0., 1., 2.]),
 array([0., 1., 2.])]
>>> sum(to_normalized_data(all_good_things))
array([149.13987448, 149.29987448, 156.99987448])
>>> print(to_normalized_data(some_bad_things))
None

Answering your comments:

For some reason my anaconda's numpy (1.17.3) returns int64 rather than int32 as you have in item 6

Looks like this behavior is OS-specific. Probably a better way to check the types of the obtained arrays would be by using np.can_cast. So, instead of writing:
```
has_bad_types = any(array.dtype not in (np.int32, np.float64) for array in arrays)
```
we could write:
```
has_bad_types = not all(np.can_cast(array.dtype, np.float64) for array in arrays)
```
Item #6 doesn't reject two different lengths if no singletons are present. With ([1, 2, 3], [1, 2, 3, 4]) as input, the output is [array([1., 2., 3., 4.])] and [1, 2, 3] just falls through the cracks and disappears

Welp, I missed this case... We can add it back as:
```
if len(sizes) > 2 or 1 not in sizes or has_bad_types:
```

also my original script tested if len(x.shape) == 1 in order to reject ndm> 1 arrays which item #6 doesn't, but that can be easily added back with something like testing for set(map(np.ndim, arrays)) == set((1,)). This is important because np.size won't distinguish between a length=4 1D array and a 2x2 array.

Yep, that's right. Taking all the above into account, the final code could look like this:

def to_normalized_data(values):
 arrays = list(map(np.atleast_1d, values))
 sizes = set(map(np.size, arrays))
 have_bad_types = not all(np.can_cast(array.dtype, np.float64) for array in arrays)
 have_several_dimensions = set(map(np.ndim, arrays)) > {1}
 if len(sizes) > 2 or 1 not in sizes or have_bad_types or have_several_dimensions:
 return None
 max_size = max(sizes)
 singletons = [float(array[0]) for array in arrays if array.size == 1]
 iterables = [array.astype(float) for array in arrays if array.size == max_size]
 return singletons + iterables

fyi I've just asked Managing user defined and default values for instance attributes

Stack Exchange Network

process numerical input arguments of mixed ints, floats, and "array-like" things of same length>1

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

process numerical input arguments of mixed ints, floats, and "array-like" things of same length>1

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions