Search code, repositories, users, issues, pull requests...

Copy link

Contributor

@jonmmease jonmmease commented Apr 9, 2020 •

edited

Loading

Overview

This PR contains a variety of optimizations targeted and improving plotly.py's import and Figure creation/serialization speed.

Lazy submodule imports in Python 3.7+

PEP-562 in Python 3.7 introduced a nice approach for implementing lazy loading of submodules. The top-level plotly/__init__.py, plotly/io/__init__.py and the full graph_objs hierarchy have been updated to use lazy submodule importing for Python 3.7+. For older Python versions, all submodule imports are still performed immediately.

Part of this process involved codegen updates to split graph object and validator classes into their own files.

Lazy creation of validators

Previously, each graph object would instantiate a set of validators (one per property) in the constructor. Now, validators are constructed when first used, and they are stored in a global cache (plotly/validator_cache.py)

Lazy creation of child graph objects

Previously, child graph objects were created in the constructor, and they were initialized for every possible property. Now, graph objects initialized either on property access or when the property is set to a non-None value (if validation is enabled, see below).

Avoid loading numpy and pandas when not in use

In several places in the codebase, we attempt to import numpy/pandas using our get_module function, and then use the pandas/numpy module handle to check whether an argument is a data structure from that library. The get_module function now has a should_load option. When set to False, get_module will only return the module if it is already loaded. This is useful because if pandas isn't loaded, then we don't need to check whether a value is a DataFrame. This keeps us from having to pay the pandas/numpy import cost when these libraries are installed but not in use. This saves ~200ms when pandas/numpy are installed but not in use.

Avoid dynamic docstring generation

This PR removes the dynamic docstring generation that was used to populate the docstrings for the Figure methods corresponding to plotly.io functions (e.g. Figure.show with created by transforming plotly.io.show). These docstrings were added statically. This saves ~200ms on import time.

Support optional validation

This PR adds support for disabling property validation using the go.validation object. This can be used as a callable to enable/disable validation for the session (e.g. go.validation(False)), or it can be used as a context manager to enable/disable validation within block of code (e.g. with go.validation(False):).

API inspired by Bokeh's implementation in bokeh/bokeh#6042.

Results

Here are some before/after performance results on Python 3.7 with this PR:

top-level import

%%time
import plotly

Version 4.6: 239 ms
PR: 2.5ms
95x speedup

Import, create empty figure, and serialize to JSON

%%time
import plotly.graph_objects as go
go.Figure().to_json()

Version 4.6: 696 ms
PR: 27ms
25x speedup

Repeatedly create empty figure and serialize to json (after import)

%%timeit
go.Figure().to_json()

Version 4.6: 68 ms
PR: 1.5ms
45x speedup

Import, load data, create animated plotly express figure, serialize to json

%%time
import plotly.express as px
df = px.data.gapminder()
fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
 size="pop", color="continent", hover_name="country", facet_col="continent",
 log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.to_json()

Version 4.6: 1530 ms
PR: 550 ms
2.7x speedup

Repeatedly create px plot after import and data are loaded

%%timeit
fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
 size="pop", color="continent", hover_name="country", facet_col="continent",
 log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.to_json()

Version 4.6: 663 ms
PR: 167 ms
4x speedup

Import, load data, create animated plotly express figure, serialize to json, skip validation

%%time
import plotly.express as px
import plotly.graph_objects as go
df = px.data.gapminder()
with go.validate(False):
 fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
 size="pop", color="continent", hover_name="country", facet_col="continent",
 log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
 fig.to_json()

PR (no validation): 449 ms
PR (with validation): 550 ms
Version 4.6: 1530 ms

Repeatedly import, load data, create animated plotly express figure, serialize to json, skip validation

%%timeit
with go.validate(False):
 fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
 size="pop", color="continent", hover_name="country", facet_col="continent",
 log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])

PR (no validation): 127ms
PR (with validation): 167ms
Version 4.6: 663 ms

cc @nicolaskruchten @emmanuelle

jonmmease added 8 commits

April 10, 2020 10:39


 Lazy imports of graph object hierarchy for Python 3.7+

4ee88bc

This involved splitting validators/graph object classes back into separate files


 lazy submodule imports for plotly and plotly.io modules

cb70a40


 Don't force import pandas and numpy if we're only checking whether an...

ceb9523

... object belongs to these packages
If pandas isn't loaded, we don't need to check whether a value is a DataFrame, and this way we
don't pay the pandas import time.


 Don't auto-import all trace type classes

5974cf5


 Don't construct validators up front for every object when object is c...

ffc277d

...reated.
Create them lazily, and cache then for use across graph objects of the same type


 Test fixes

181e427


 Add optional validation using go.validate object as callable of conte...

aec7e9a

...xt manager


 Delay additional imports

1084ba0

@jonmmease jonmmease force-pushed the import_init_optimization branch from ecdfca2 to 4a8ea51 Compare

April 10, 2020 16:18


 dynamic to static docstrings for Figure io methods

7e50c44

Saves 200ms of startup time!

@jonmmease jonmmease force-pushed the import_init_optimization branch from 4a8ea51 to 7e50c44 Compare

April 10, 2020 16:24

@jonmmease jonmmease changed the title ~~(削除) Import init optimization (削除ここまで)~~ (追記) Import and initialization optimizations (追記ここまで)

Apr 10, 2020

[Feature Request] Allow direct import of utils.PlotlyJSONEncoder for faster Dash startup time #2174

Copy link

Contributor Author

jonmmease commented Apr 10, 2020

cc @alexcjohnson @chriddyp. These changes should help out with improving the responsiveness of Dash hot-reload. And should significantly reduce the performance cost of using graph objects and px to generate Figures in Dash callbacks.

This was referenced Apr 10, 2020

Closed

Very slow performance of create_annotated_heatmap for small dataset #2299

Closed

@chriddyp

Copy link

Member

chriddyp commented Apr 10, 2020

Very nice! Quarter second speed up on import & half second speed up when creating px figures.. really impressive! 🐎

@jonmmease jonmmease mentioned this pull request

Apr 10, 2020

Importing plotly takes a lot of time #740

Closed

Copy link

Contributor

nicolaskruchten commented Apr 11, 2020

Very nice! Might take me a sec to review ;)

image

(j/k I know most of those are codegen'ed)

Copy link

Contributor

emmanuelle commented Apr 11, 2020

Regarding review, is there any way to re-order/squash some commits so that it is possible to review independently the non-codegened part?

Copy link

Contributor Author

jonmmease commented Apr 11, 2020

Good, point @emmanuelle and @nicolaskruchten. Sorry for not providing a commit overview.

All of the codegen changes are in 4ee88bc. For that commit, that hand-edited changes are in the following files (everything else is codegen output):

packages/python/plotly/_plotly_utils/importers.py (4ee88bc)
packages/python/plotly/codegen/__init__.py (4ee88bc#diff-a08c8c3dc3faeb46a8a7a7eabb8da789)
packages/python/plotly/codegen/datatypes.py (4ee88bc#diff-9621264aeec2343e3c66b9920e449d85)
packages/python/plotly/codegen/figure.py (4ee88bc#diff-6cda9ae3edaeb3c80537c48218709489)
packages/python/plotly/codegen/utils.py (4ee88bc#diff-686bfdad0d0b0d7117728f3ead360ccb)
packages/python/plotly/codegen/validators.py (4ee88bc#diff-e48bce2a16745a3b4ea9dc7f242d09f1)

The rest of the commits can be reviewed individually and do not include codegen changes.

Thanks!

Copy link

Contributor

nicolaskruchten commented Apr 14, 2020

I don't have a full grasp of the changes made here but the descriptions make sense, the tests pass, the docs build and most CI jobs seem to go faster, so I'd call this a win :)

Copy link

Contributor

nicolaskruchten commented Apr 14, 2020

💃 unless objections

Copy link

Contributor

nicolaskruchten commented Apr 14, 2020

WOW... pytest plotly/tests/test_core/test_px/ goes from 43 seconds to 8 seconds on my machine! 5x speedup!

emmanuelle

emmanuelle reviewed

Apr 14, 2020

View reviewed changes

packages/python/plotly/_plotly_utils/importers.py Outdated Show resolved Hide resolved

emmanuelle

emmanuelle reviewed

Apr 14, 2020

View reviewed changes

packages/python/plotly/_plotly_utils/importers.py Outdated

# Check for submodule

if import_name in module_names:

# print(parent_name, import_name)

Copy link

Contributor

@emmanuelle emmanuelle Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove commented out print statements

Copy link

Contributor

emmanuelle commented Apr 14, 2020

So in this branch I cannot get the tab completion to work for go objects in ipython and jupyter, probably because of lazy loading. Is there any way to keep the performance improvement but to get the tab completion back ? Maybe populating the __all__ variable of each submodule would work (haven't checked).

Copy link

Contributor

emmanuelle commented Apr 14, 2020

Also I generated the API doc on this branch and some links to go classes are broken, I need to understand why and how it can be fixed.

Copy link

Contributor

nicolaskruchten commented Apr 14, 2020

So in this branch I cannot get the tab completion to work for go objects in ipython and jupyter

Can you provide some more details on the exact scenario you're trying, including versions of various things? I'm on Python 3.7.7 / JupyterLab 2.1 and I'm trying fig = px.scatter(x=[1,2,3]) then fig.lay<tab> and I can complete layout and then .hov<tab> and I can complete hoverlabel and then .b<tab> completes bgcolor so this appears to work 'all the way down'. This works in both /lab and /notebooks.

Copy link

Contributor

nicolaskruchten commented Apr 14, 2020

This also works at the command-line with ipython, for me.

Copy link

Contributor

emmanuelle commented Apr 14, 2020 •

edited

Loading

@nicolaskruchten what you describe works for me too. What does not work is to do go.La + TAB, go.Layout.bar + TAB, go.Choro + TAB, etc. Python 3.7.3 here, ipython 7.8.0 and notebook server 5.7.4. (after doing import plotly.graph_objects as go, of course 😁 )

Copy link

Contributor Author

jonmmease commented Apr 15, 2020

Ahh, I think I have a solution. Looks like ipython honors the module-level __dir__() function that was defined in PEP-562 (https://www.python.org/dev/peps/pep-0562/) along with __getattr__().

I'll add these function to the codegen and update the PR. Hopefully this will also solve the documentation generation issue @emmanuelle mentioned.

Copy link

Contributor

nicolaskruchten commented Apr 15, 2020

Cool! So in principle this shouldn't be different in 3.6?

Copy link

Contributor

emmanuelle commented Apr 15, 2020

I also have jedi installed,0.15.1.

Copy link

Contributor

emmanuelle commented Apr 15, 2020

@jonmmease it'd be awesome if it's possible to have the best of both worlds :-).

@nicolaskruchten I tried with conda envs and specific versions of Python, and this branch: py3.6, tab completion works well with go. TAB, but with py3.8, it does not work. Is it a linux thing then? To be continued...

Copy link

Contributor

nicolaskruchten commented Apr 15, 2020

Well if it’s broken only on 3.7+ then maybe Jon’s upcoming fix will resolve it! Thanks for checking! My fear was that it would also be broken in 3.6 but if that’s not the case then we’re in luck :)


 Use module-level __dir__ function to restore IPython tab completion w...

f7e672b

...ith Python 3.7+

Copy link

Contributor Author

jonmmease commented Apr 16, 2020

OK, IPython tab completion seems to be working well for me now with Python 3.7. Please let me know what you see in your environments! Thanks

jonmmease added 2 commits

April 16, 2020 10:15


 Don't fail hierarchy test when ipywidgets not installed

20fc393


 cut commented print statements

f979814

Copy link

Contributor

emmanuelle commented Apr 16, 2020

so there is some progress for me in python 3.7 (pip env), since i can now do go.F TAB to get go.Figure, or go.B TAB for go.Bar, but I cannot go deeper in the hierarchy, for example go.bar.M + TAB does not return anything.


 Test fix, FigureWidget is not expected to be importable when ipywidge...

7600eaf

...ts is not installed

Copy link

Contributor

nicolaskruchten commented Apr 16, 2020

@jonmmease are you able to replicate Emma's issues at all locally? Just in a shell with ipython ?

Screenshot_20200416_133605

Copy link

Contributor Author

jonmmease commented Apr 16, 2020

Huh, this is working for me on Python 3.7 with ipython 7.13 😕

(Despite the environment name, this is Python 3.7 🙈)

@emmanuelle any improvement in the behavior of documentation generation

@nicolaskruchten how do things look for you?

Copy link

Contributor

nicolaskruchten commented Apr 16, 2020

@jonmmease is this Linux? I'll check locally in a bit

Screenshot_20200416_134612

Copy link

Contributor Author

jonmmease commented Apr 16, 2020 •

edited

Loading

Yeah, I'm on Linux. It's also working in plain vanilla python repl

Copy link

Contributor

nicolaskruchten commented Apr 16, 2020

no change for me, I can tab-complete to go.Layout.hoverlabel but not through to .bgcolor. Which, BTW, is fine by me.

Copy link

Contributor

nicolaskruchten commented Apr 16, 2020

If I instantiate fig = go.Figure() then I seem to be able to drill down arbitrarily deeply: fig.layout.hoverlabel.font.color. I can also go arbitrarily deeply with go.bar.marker.coloraxis so long as I stay lower-cased and not to e.g. go.bar.Marker.whatever