- 
 
- 
  Notifications
 You must be signed in to change notification settings 
- Fork 2.7k
patch: deepcopy figure fix #4926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
patch: deepcopy figure fix #4926
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 different black version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting! Yeah not sure why that's happening but it looks fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would not worry too much about this, I just wonder why that's the case 😇
I'd love to see a test that fails before these changes! I'm a little concerned that the existing ones didn't catch this
Hey @marthacryan , thanks for getting back to me. Should the example in the issue enough as a test?
 
 
 
 stefanv
 
 
 
 commented
 Dec 5, 2024 
 
 
 
Here's a reproducing example that came out of the skimage CI:
import numpy as np import plotly import plotly.express as px import plotly.graph_objects as go plotly.io.renderers.default = "sphinx_gallery_png" img = np.zeros((50, 50)) fig = px.imshow(img, binary_string=True) fig.add_trace( go.Scatter( x=np.array([[1,2,3],[4,5,6]]), y=np.array([[1,2,3],[4,5,6]]), name="1" ) ) plotly.io.show(fig)
Which results in:
ValueError:
 Invalid value of type 'builtins.dict' received for the 'x' property of scatter
 Received value: {'dtype': 'i1', 'bdata': 'AQIDBAUG', 'shape': '2, 3'}
 The 'x' property is an array that may be specified as a tuple,
 list, numpy array, or pandas Series
If you toggle binary_string to False, it brings up another unexpected error:
ValueError:
 Invalid value of type 'builtins.dict' received for the 'z' property of heatmap
 Received value: {'dtype': 'f8', 'bdata': 'AAAAAAAAAAAAAAAAAA...AAA=', 'shape': '50, 50'}
 The 'z' property is an array that may be specified as a tuple,
 list, numpy array, or pandas Series
This is plotly 6.0.0rc0 and kaleido 1.0.0rc0.
I am not particularly confident with this part of the code. Minimal example failing in master:
import copy import plotly.express as px gapminder = px.data.gapminder() fig = px.line(gapminder, x="year", y="gdpPercap", color="country") fig_copy = copy.deepcopy(fig)
The entire error trace gives some more insights, here what I could figure out (maybe it is trivial, yet it wasn't for me)
- deepcopycalls- BaseFigure.__reduce__, which calls- BaseFigure.to_dict, which calls- convert_to_base64.
- At this point every value is converted to the following form:
'x' = {'dtype': 'i2', 'bdata': 'oAelB6oHrwe0B7kHvgfDB8gHzQfSB9cH'} 'y' = {'dtype': 'f8', 'bdata': 'ESBU8...7M'} 
- Then deepcopywill call_reconstruct, which tries to runFigure(*converted_values), henceBaseFigure(*converted_values). Finally herevalidate_coerceis hit and it will fail.
Hope it helps
Ah yeah so I can give a little context here. The approach we took for sending base64 encoded arrays was to add the encoding step to the to_json function. Our thinking here was that this would be basically the last step before sending data to the frontend, so it would have minimal effects in the rest of the codebase. The performance improvements from base64 encoding arrays are mostly in the step of sending data from python to the client. However, we didn't expect that the base64 encoded arrays (which are also called typed array specs in parts of the code) would be used as input for the Figure class. That's why we're skipping the array validation if we encounter this datatype.
I'm not sure why this isn't failing for other deepcopy tests, but one guess I have is that they aren't numpy/pandas arrays? The base64 encoding only happens for numpy/pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test looks good thank you for doing that! Do you think you could add a simple check that the data did copy correctly? Also, where are the other deepcopy tests you mentioned? Does it make more sense to include this test with those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @marthacryan , what's the best way to assert that the values are the same? is it fig_copy.to_dict() == fig.to_dict()?
Here are few other figure deepcopy's in the test suite:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using fig.to_dict() is fine here. There are other ways to compare them but I just wanted a baseline check that the data is still there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks @FBruzzesi!
Code PR
Closes #4925 by reverting the changes in 960adb9 to check for
is_none_or_typed_array_spec.Looking for indication if worth adding a test. I see that in few tests a deepcopy of figure is called, thus... I am a bit confused 🙃