-
-
Notifications
You must be signed in to change notification settings - Fork 8k
MNT: fix __array__ to numpy #22975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT: fix __array__ to numpy #22975
Conversation
This fails the unit handling...
That's tricky: so it seems that the Quantity
class being tested is not an ndarray
, but tries to behave similarly so it can be used as an array. But it also has an __array__
method as well which converts it to a plain ndarray
. I guess the problem is that the presence of __array__
can mean both "I'm not an array, so if you want an array, use this" or "I'm an array duck type, but if you want a real ndarray
, use this". I think ideally this whole routine would be something like
return x if isinstance(x, (np.ndarray, ndarray_ducktype)) else np.asarray(x)
But the real question is how to determine whether something is an ndarray_ducktype
...
085b1f9
to
6ccd2c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to fix the problem. and see the test. However I am not sure if this is really on us, or the Kernel
, which is missing ndim
. Or maybe we should not be relying on ndim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ndim
always len(shape)
? (I guess not, but otherwise it make make sense to just use that instead?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 120 ndim
in the Python code base (although some are in tests or non-Python-code), so even though we probably should fix it here, in the long run it may be better that Kernel
gets ndim
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that Kernel
does not really properly behave like an ndarray
, so it does need to be converted to an array here. We're (ab)using the fact that it does not define ndim
as a proxy for "I'm not a proper duck type, just happen to have shape
". Really one would like the np.asduckarray
that has been discussed in numpy...
But the real question is how to determine whether something is an
ndarray_ducktype
...
That cannot be answered generally. Duck-typing is an operational concept: If it provides the functions I need, it's a duck-type to me. Duck types are never perfect (at the very least they'd fail isinstance()
behavior), but if I only need objects that quack like a duck I don't care whether it walks like a duck as well.
The only thing we could do is hard-code a list of known good-enough duck-types. Otherwise we'd have to query the object upfront for all the behavior we want which is practially impossible.
A third theoretical option would be to fully embrace ducktyping: Try with the object as is, if that fails convert the object to a numpy array and retry. However, that can lead to performance issues and more importantly our code is not designed for retries, so we might get into consistency issues if the first try partially modifies some state.
Just writing as I sort this out in my head.
Most of our methods do something like
x, y = self._process_unit_info([("x", x), ("y", y)], kwargs) x = np.ma.ravel(x) # or alternately np.asarray(x) y = np.ma.ravel(y)
by which point we have set the units, and x and y are now type ndarray
.
The ax.plot
method, on the other hand, needs to unpack x and y into individual arguments so we do:
x = _check_1d(xy[0]) y = _check_1d(xy[1]) self.axes.xaxis.update_units(x) self.axes.yaxis.update_units(y)
However, we do not want to lose units yet, and indeed we do not, at least according to test_units.py::test_numpy_facade
, which claims we have the ability to transform units at any time with ax.xaxis.set_units
. However, note that ax.xaxis.set_units
absolutely fails with scatter
(see #9713 (comment)) so dragging around the unit information only works for plot
currently.
Hence we are in a bit of a Gordian knot of compatibility. We have enshrined pint
as something we support, but only for plot
apparently. We allow __array__
but currently depend on it having ndim
and shape
. If an array doesn't have shape
then we np.atleast_1d
it. I suppose we can do the same thing to arrays without ndim
.
This works, but I really feel this is all a mess. Hopefully one of our RSE's can work on this...
Arrays sometimes don't have all the methods arrays should have, so add another check here. Plot requires both ndim and shape and this will extract the numpy array if x does not have those attributes. Otherwise leave the object alone, because unit support (currently only in plot) requires the object to retain the unit info.
e2da0f9
to
29ce2c2
Compare
@jklymak - indeed, also for astropy unit support works really well for plots, but not for images (cannot give units for an extent). Would be nice if that could also be exchanged. I think for the particular bug that triggered all this, we can also adjust on our side -- it is not something used all the time.
Just to clarify my comment, again while I am thinking about it - ideally plot
would do the same thing as the other methods and convert to numpy immediately after the units have been determined. But currently we cannot because plot
(and only plot
) tries to preserve unit information.
In my mind what the proper way to do this would be to
- determine units of the object and pass to the axis
- use the units to covert the unit-full object to numpy into some base unit for the axis (so if the converter allows 'feet', 'miles', and 'meters', convert to floats in meters).
- Allow the axis to scale its data in the units that the user wants (so if they specify 'feet', add a transform from meters to feet to the displayed data, but keep the underlying data in meters).
This way Matplotlib wouldn't need to drag around the extra units internally, it would just have a transform on each axis.
yes, that makes sense
I think the reason it's currently set up this way with Pint and plot()
is because plot()
is the only one I've personally needed to adjust axis units--whereas most of my use of other plot units is spatially, where units only matter for the z/image axis (e.g. imshow
, contour
).
I'm not sure how you can tweak the current conversion interface without breaking backwards compatibility...
I think my proposal would keep the same public user interface but change the internal representation of data passed in w units.
I think it could also be done so that most features work for downstream libraries the way they do now for scatter (not able to change axis units properly) but to have the plot-like full feature downstream libraries would need to add more logic to their locators and formatters.
See #23015 for more discussion of the convert-immediately strategy for dealing with units.
Despite the discussion above, I think this should go in as-is for 3.5.3....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should indeed go in for 3.5.3 to fix the regression.
...975-on-v3.5.x Backport PR #22975 on branch v3.5.x (MNT: fix __array__ to numpy)
Uh oh!
There was an error while loading. Please reload this page.
PR Summary
Second attempt: closes #22973
Arrays sometimes don't have all the methods arrays should have, so
add another check here. Plot requires both ndim and shape and this
will extract the numpy array if x does not have those attributes.
Otherwise leave the object alone, because unit support (currently only
in plot) requires the object to retain the unit info.
PR Checklist
Tests and Styling
pytest
passes).flake8-docstrings
and runflake8 --docstring-convention=all
).Documentation
doc/users/next_whats_new/
(follow instructions in README.rst there).doc/api/next_api_changes/
(follow instructions in README.rst there).