Search code, repositories, users, issues, pull requests...

Copy link

Member Author

jklymak commented May 4, 2022

This fails the unit handling...

tacaswell

tacaswell reviewed

May 4, 2022

lib/matplotlib/cbook/__init__.py Outdated Show resolved Hide resolved

@mhvk

Copy link

Contributor

mhvk commented May 4, 2022

That's tricky: so it seems that the Quantity class being tested is not an ndarray, but tries to behave similarly so it can be used as an array. But it also has an __array__ method as well which converts it to a plain ndarray. I guess the problem is that the presence of __array__ can mean both "I'm not an array, so if you want an array, use this" or "I'm an array duck type, but if you want a real ndarray, use this". I think ideally this whole routine would be something like

return x if isinstance(x, (np.ndarray, ndarray_ducktype)) else np.asarray(x)

But the real question is how to determine whether something is an ndarray_ducktype...

@jklymak jklymak force-pushed the mnt-fix-array-numpy-second-try branch 2 times, most recently from 085b1f9 to 6ccd2c1 Compare

May 5, 2022 04:38

jklymak

jklymak commented

May 5, 2022

lib/matplotlib/cbook/__init__.py

x = _unpack_to_numpy(x)

if not hasattr(x, 'shape') or len(x.shape) < 1:

if (not hasattr(x, 'shape') or

not hasattr(x, 'ndim') or

Copy link

Member Author

@jklymak jklymak May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to fix the problem. and see the test. However I am not sure if this is really on us, or the Kernel, which is missing ndim. Or maybe we should not be relying on ndim.

Copy link

Member

@oscargus oscargus May 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ndim always len(shape)? (I guess not, but otherwise it make make sense to just use that instead?)

Copy link

Member

@oscargus oscargus May 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 120 ndim in the Python code base (although some are in tests or non-Python-code), so even though we probably should fix it here, in the long run it may be better that Kernel gets ndim.

Copy link

Contributor

@mhvk mhvk May 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that Kernel does not really properly behave like an ndarray, so it does need to be converted to an array here. We're (ab)using the fact that it does not define ndim as a proxy for "I'm not a proper duck type, just happen to have shape". Really one would like the np.asduckarray that has been discussed in numpy...

@timhoffm

Copy link

Member

timhoffm commented May 5, 2022 •

edited

Loading

But the real question is how to determine whether something is an ndarray_ducktype ...

That cannot be answered generally. Duck-typing is an operational concept: If it provides the functions I need, it's a duck-type to me. Duck types are never perfect (at the very least they'd fail isinstance() behavior), but if I only need objects that quack like a duck I don't care whether it walks like a duck as well.

The only thing we could do is hard-code a list of known good-enough duck-types. Otherwise we'd have to query the object upfront for all the behavior we want which is practially impossible.
A third theoretical option would be to fully embrace ducktyping: Try with the object as is, if that fails convert the object to a numpy array and retry. However, that can lead to performance issues and more importantly our code is not designed for retries, so we might get into consistency issues if the first try partially modifies some state.

Copy link

Member Author

jklymak commented May 6, 2022

Just writing as I sort this out in my head.

Most of our methods do something like

 x, y = self._process_unit_info([("x", x), ("y", y)], kwargs)
 x = np.ma.ravel(x) # or alternately np.asarray(x)
 y = np.ma.ravel(y)

by which point we have set the units, and x and y are now type ndarray.

The ax.plot method, on the other hand, needs to unpack x and y into individual arguments so we do:

 x = _check_1d(xy[0])
 y = _check_1d(xy[1])
 self.axes.xaxis.update_units(x)
 self.axes.yaxis.update_units(y)

However, we do not want to lose units yet, and indeed we do not, at least according to test_units.py::test_numpy_facade, which claims we have the ability to transform units at any time with ax.xaxis.set_units. However, note that ax.xaxis.set_units absolutely fails with scatter (see #9713 (comment)) so dragging around the unit information only works for plot currently.

Hence we are in a bit of a Gordian knot of compatibility. We have enshrined pint as something we support, but only for plot apparently. We allow __array__ but currently depend on it having ndim and shape. If an array doesn't have shape then we np.atleast_1d it. I suppose we can do the same thing to arrays without ndim.

@jklymak jklymak marked this pull request as ready for review

May 6, 2022 21:10

Copy link

Member Author

jklymak commented May 6, 2022

This works, but I really feel this is all a mess. Hopefully one of our RSE's can work on this...

@jklymak jklymak added this to the v3.5.3 milestone

May 6, 2022

@jklymak jklymak added the Release critical label

May 6, 2022


 FIX: fix check_1d to also check for ndim

29ce2c2

Arrays sometimes don't have all the methods arrays should have, so
add another check here. Plot requires both ndim and shape and this
will extract the numpy array if x does not have those attributes.
Otherwise leave the object alone, because unit support (currently only
in plot) requires the object to retain the unit info.

@jklymak jklymak force-pushed the mnt-fix-array-numpy-second-try branch from e2da0f9 to 29ce2c2 Compare

May 6, 2022 21:14

@mhvk

Copy link

Contributor

mhvk commented May 6, 2022

@jklymak - indeed, also for astropy unit support works really well for plots, but not for images (cannot give units for an extent). Would be nice if that could also be exchanged. I think for the particular bug that triggered all this, we can also adjust on our side -- it is not something used all the time.

Copy link

Member Author

jklymak commented May 6, 2022

Just to clarify my comment, again while I am thinking about it - ideally plot would do the same thing as the other methods and convert to numpy immediately after the units have been determined. But currently we cannot because plot (and only plot) tries to preserve unit information.

In my mind what the proper way to do this would be to

determine units of the object and pass to the axis
use the units to covert the unit-full object to numpy into some base unit for the axis (so if the converter allows 'feet', 'miles', and 'meters', convert to floats in meters).
Allow the axis to scale its data in the units that the user wants (so if they specify 'feet', add a transform from meters to feet to the displayed data, but keep the underlying data in meters).

This way Matplotlib wouldn't need to drag around the extra units internally, it would just have a transform on each axis.

@mhvk

Copy link

Contributor

mhvk commented May 6, 2022

yes, that makes sense

@dopplershift

Copy link

Contributor

dopplershift commented May 7, 2022

I think the reason it's currently set up this way with Pint and plot() is because plot() is the only one I've personally needed to adjust axis units--whereas most of my use of other plot units is spatially, where units only matter for the z/image axis (e.g. imshow, contour).

I'm not sure how you can tweak the current conversion interface without breaking backwards compatibility...

Copy link

Member Author

jklymak commented May 7, 2022

I think my proposal would keep the same public user interface but change the internal representation of data passed in w units.

I think it could also be done so that most features work for downstream libraries the way they do now for scatter (not able to change axis units properly) but to have the plot-like full feature downstream libraries would need to add more logic to their locators and formatters.

Copy link

Member Author

jklymak commented May 8, 2022

See #23015 for more discussion of the convert-immediately strategy for dealing with units.

@jklymak jklymak added the topic: units and array ducktypes label

May 8, 2022

Copy link

Member Author

jklymak commented May 18, 2022

Despite the discussion above, I think this should go in as-is for 3.5.3....

oscargus

oscargus approved these changes

Copy link

Member

@oscargus oscargus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should indeed go in for 3.5.3 to fix the regression.

timhoffm

timhoffm approved these changes

@timhoffm timhoffm merged commit da84e38 into matplotlib:main

@meeseeksmachine meeseeksmachine mentioned this pull request

Backport PR #22975 on branch v3.5.x (MNT: fix __array__ to numpy) #23064

Merged

meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request

@timhoffm @meeseeksmachine


 Backport PR matplotlib#22975 : MNT: fix __array__ to numpy

99b7695

@jklymak jklymak deleted the mnt-fix-array-numpy-second-try branch

May 18, 2022 12:12

@WilliamJamieson WilliamJamieson mentioned this pull request