As I've described in a StackOverflow question, I'm trying to fit a NumPy array into a certain range.
Here is the solution I currently use:
import numpy as np
def scale_array(dat, out_range=(-1, 1)):
domain = [np.min(dat, axis=0), np.max(dat, axis=0)]
def interp(x):
return out_range[0] * (1.0 - x) + out_range[1] * x
def uninterp(x):
b = 0
if (domain[1] - domain[0]) != 0:
b = domain[1] - domain[0]
else:
b = 1.0 / domain[1]
return (x - domain[0]) / b
return interp(uninterp(dat))
print(scale_array(np.array([-2, 0, 2], dtype=np.float)))
# Gives: [-1., 0., 1.]
print(scale_array(np.array([-3, -2, -1], dtype=np.float)))
# Gives: [-1., 0., 1.]
Is there a way to make this code cleaner? Is there a built-in function in NumPy or scikit-learn? This feels like a really common data pre-processing step and it feels weird that I keep re-implementing it.
2 Answers 2
NumPy provides numpy.interp
for 1-dimensional linear interpolation. In this case, where you want to map the minimum element of the array to −1 and the maximum to +1, and other elements linearly in-between, you can write:
np.interp(a, (a.min(), a.max()), (-1, +1))
For more advanced kinds of interpolation, there's scipy.interpolate
.
What you need here are basically two rescalings. The first is to rescale the data to be symmetric around 0
and the second is to shift and scale it to the out_range
. Both can be simply written down, there is no need for your inner functions and their special cases.
def scale(x, out_range=(-1, 1)):
domain = np.min(x), np.max(x)
y = (x - (domain[1] + domain[0]) / 2) / (domain[1] - domain[0])
return y * (out_range[1] - out_range[0]) + (out_range[1] + out_range[0]) / 2
Note that I removed the axis=0
arguments to np.min
and np.max
. By default they run over all axes. If that is not what you want, but you want to rescale only some axis, I would make this a parameter of the scale
function to give the user full control:
def scale(x, out_range=(-1, 1), axis=None):
domain = np.min(x, axis), np.max(x, axis)
y = (x - (domain[1] + domain[0]) / 2) / (domain[1] - domain[0])
return y * (out_range[1] - out_range[0]) + (out_range[1] + out_range[0]) / 2
This function behaves the same as yours, even with out_range = (-1, -1)
.