Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Import and initialization optimizations #2368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
jonmmease merged 14 commits into master from import_init_optimization
Apr 16, 2020
Merged

Conversation

@jonmmease
Copy link
Contributor

@jonmmease jonmmease commented Apr 9, 2020
edited
Loading

Overview

This PR contains a variety of optimizations targeted and improving plotly.py's import and Figure creation/serialization speed.

Lazy submodule imports in Python 3.7+

PEP-562 in Python 3.7 introduced a nice approach for implementing lazy loading of submodules. The top-level plotly/__init__.py, plotly/io/__init__.py and the full graph_objs hierarchy have been updated to use lazy submodule importing for Python 3.7+. For older Python versions, all submodule imports are still performed immediately.

Part of this process involved codegen updates to split graph object and validator classes into their own files.

Lazy creation of validators

Previously, each graph object would instantiate a set of validators (one per property) in the constructor. Now, validators are constructed when first used, and they are stored in a global cache (plotly/validator_cache.py)

Lazy creation of child graph objects

Previously, child graph objects were created in the constructor, and they were initialized for every possible property. Now, graph objects initialized either on property access or when the property is set to a non-None value (if validation is enabled, see below).

Avoid loading numpy and pandas when not in use

In several places in the codebase, we attempt to import numpy/pandas using our get_module function, and then use the pandas/numpy module handle to check whether an argument is a data structure from that library. The get_module function now has a should_load option. When set to False, get_module will only return the module if it is already loaded. This is useful because if pandas isn't loaded, then we don't need to check whether a value is a DataFrame. This keeps us from having to pay the pandas/numpy import cost when these libraries are installed but not in use. This saves ~200ms when pandas/numpy are installed but not in use.

Avoid dynamic docstring generation

This PR removes the dynamic docstring generation that was used to populate the docstrings for the Figure methods corresponding to plotly.io functions (e.g. Figure.show with created by transforming plotly.io.show). These docstrings were added statically. This saves ~200ms on import time.

Support optional validation

This PR adds support for disabling property validation using the go.validation object. This can be used as a callable to enable/disable validation for the session (e.g. go.validation(False)), or it can be used as a context manager to enable/disable validation within block of code (e.g. with go.validation(False):).

API inspired by Bokeh's implementation in bokeh/bokeh#6042.

Results

Here are some before/after performance results on Python 3.7 with this PR:

top-level import

%%time
import plotly

Version 4.6: 239 ms
PR: 2.5ms
95x speedup

Import, create empty figure, and serialize to JSON

%%time
import plotly.graph_objects as go
go.Figure().to_json()

Version 4.6: 696 ms
PR: 27ms
25x speedup

Repeatedly create empty figure and serialize to json (after import)

%%timeit
go.Figure().to_json()

Version 4.6: 68 ms
PR: 1.5ms
45x speedup

Import, load data, create animated plotly express figure, serialize to json

%%time
import plotly.express as px
df = px.data.gapminder()
fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
 size="pop", color="continent", hover_name="country", facet_col="continent",
 log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.to_json()

Version 4.6: 1530 ms
PR: 550 ms
2.7x speedup

Repeatedly create px plot after import and data are loaded

%%timeit
fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
 size="pop", color="continent", hover_name="country", facet_col="continent",
 log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
fig.to_json()

Version 4.6: 663 ms
PR: 167 ms
4x speedup

Import, load data, create animated plotly express figure, serialize to json, skip validation

%%time
import plotly.express as px
import plotly.graph_objects as go
df = px.data.gapminder()
with go.validate(False):
 fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
 size="pop", color="continent", hover_name="country", facet_col="continent",
 log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])
 fig.to_json()

PR (no validation): 449 ms
PR (with validation): 550 ms
Version 4.6: 1530 ms

Repeatedly import, load data, create animated plotly express figure, serialize to json, skip validation

%%timeit
with go.validate(False):
 fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
 size="pop", color="continent", hover_name="country", facet_col="continent",
 log_x=True, size_max=45, range_x=[100,100000], range_y=[25,90])

PR (no validation): 127ms
PR (with validation): 167ms
Version 4.6: 663 ms

cc @nicolaskruchten @emmanuelle

anders-kiaer and jackparmer reacted with hooray emoji alexcjohnson, jackparmer, and michaelbabyn reacted with heart emoji anders-kiaer and jackparmer reacted with rocket emoji
This involved splitting validators/graph object classes back into separate files
... object belongs to these packages
If pandas isn't loaded, we don't need to check whether a value is a DataFrame, and this way we
don't pay the pandas import time.
...reated.
Create them lazily, and cache then for use across graph objects of the same type
@jonmmease jonmmease force-pushed the import_init_optimization branch from ecdfca2 to 4a8ea51 Compare April 10, 2020 16:18
@jonmmease jonmmease force-pushed the import_init_optimization branch from 4a8ea51 to 7e50c44 Compare April 10, 2020 16:24
@jonmmease jonmmease changed the title (削除) Import init optimization (削除ここまで) (追記) Import and initialization optimizations (追記ここまで) Apr 10, 2020
Copy link
Contributor Author

cc @alexcjohnson @chriddyp. These changes should help out with improving the responsiveness of Dash hot-reload. And should significantly reduce the performance cost of using graph objects and px to generate Figures in Dash callbacks.

Copy link
Member

Very nice! Quarter second speed up on import & half second speed up when creating px figures.. really impressive! 🐎

Copy link
Contributor

Very nice! Might take me a sec to review ;)

image

(j/k I know most of those are codegen'ed)

Copy link
Contributor

Regarding review, is there any way to re-order/squash some commits so that it is possible to review independently the non-codegened part?

Copy link
Contributor Author

Good, point @emmanuelle and @nicolaskruchten. Sorry for not providing a commit overview.

All of the codegen changes are in 4ee88bc. For that commit, that hand-edited changes are in the following files (everything else is codegen output):

The rest of the commits can be reviewed individually and do not include codegen changes.

Thanks!

Copy link
Contributor

I don't have a full grasp of the changes made here but the descriptions make sense, the tests pass, the docs build and most CI jobs seem to go faster, so I'd call this a win :)

Copy link
Contributor

💃 unless objections

alexcjohnson reacted with rocket emoji

Copy link
Contributor

WOW... pytest plotly/tests/test_core/test_px/ goes from 43 seconds to 8 seconds on my machine! 5x speedup!


# Check for submodule
if import_name in module_names:
# print(parent_name, import_name)
Copy link
Contributor

@emmanuelle emmanuelle Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove commented out print statements

Copy link
Contributor

So in this branch I cannot get the tab completion to work for go objects in ipython and jupyter, probably because of lazy loading. Is there any way to keep the performance improvement but to get the tab completion back ? Maybe populating the __all__ variable of each submodule would work (haven't checked).

Copy link
Contributor

Also I generated the API doc on this branch and some links to go classes are broken, I need to understand why and how it can be fixed.

Copy link
Contributor

So in this branch I cannot get the tab completion to work for go objects in ipython and jupyter

Can you provide some more details on the exact scenario you're trying, including versions of various things? I'm on Python 3.7.7 / JupyterLab 2.1 and I'm trying fig = px.scatter(x=[1,2,3]) then fig.lay<tab> and I can complete layout and then .hov<tab> and I can complete hoverlabel and then .b<tab> completes bgcolor so this appears to work 'all the way down'. This works in both /lab and /notebooks.

Copy link
Contributor

This also works at the command-line with ipython, for me.

Copy link
Contributor

emmanuelle commented Apr 14, 2020
edited
Loading

@nicolaskruchten what you describe works for me too. What does not work is to do go.La + TAB, go.Layout.bar + TAB, go.Choro + TAB, etc. Python 3.7.3 here, ipython 7.8.0 and notebook server 5.7.4. (after doing import plotly.graph_objects as go, of course 😁 )

Copy link
Contributor Author

Ahh, I think I have a solution. Looks like ipython honors the module-level __dir__() function that was defined in PEP-562 (https://www.python.org/dev/peps/pep-0562/) along with __getattr__().

I'll add these function to the codegen and update the PR. Hopefully this will also solve the documentation generation issue @emmanuelle mentioned.

Copy link
Contributor

Cool! So in principle this shouldn't be different in 3.6?

Copy link
Contributor

I also have jedi installed,0.15.1.

Copy link
Contributor

@jonmmease it'd be awesome if it's possible to have the best of both worlds :-).

@nicolaskruchten I tried with conda envs and specific versions of Python, and this branch: py3.6, tab completion works well with go. TAB, but with py3.8, it does not work. Is it a linux thing then? To be continued...

Copy link
Contributor

Well if it’s broken only on 3.7+ then maybe Jon’s upcoming fix will resolve it! Thanks for checking! My fear was that it would also be broken in 3.6 but if that’s not the case then we’re in luck :)

Copy link
Contributor Author

OK, IPython tab completion seems to be working well for me now with Python 3.7. Please let me know what you see in your environments! Thanks

Copy link
Contributor

so there is some progress for me in python 3.7 (pip env), since i can now do go.F TAB to get go.Figure, or go.B TAB for go.Bar, but I cannot go deeper in the hierarchy, for example go.bar.M + TAB does not return anything.

Copy link
Contributor

@jonmmease are you able to replicate Emma's issues at all locally? Just in a shell with ipython ?

Copy link
Contributor Author

Huh, this is working for me on Python 3.7 with ipython 7.13 😕

Screenshot_20200416_133605

(Despite the environment name, this is Python 3.7 🙈)

@emmanuelle any improvement in the behavior of documentation generation

@nicolaskruchten how do things look for you?

Copy link
Contributor

@jonmmease is this Linux? I'll check locally in a bit

Copy link
Contributor Author

jonmmease commented Apr 16, 2020
edited
Loading

Yeah, I'm on Linux. It's also working in plain vanilla python repl

Screenshot_20200416_134612

Copy link
Contributor

no change for me, I can tab-complete to go.Layout.hoverlabel but not through to .bgcolor. Which, BTW, is fine by me.

Copy link
Contributor

If I instantiate fig = go.Figure() then I seem to be able to drill down arbitrarily deeply: fig.layout.hoverlabel.font.color. I can also go arbitrarily deeply with go.bar.marker.coloraxis so long as I stay lower-cased and not to e.g. go.bar.Marker.whatever

Copy link
Contributor Author

jonmmease commented Apr 16, 2020
edited
Loading

go.Layout.hoverlabel

Did you mean to use a capital L there? In that case hoverlabel would be the method object and that method object wouldn't be expected to have completion.

I was looking at submodule imports like this go.layout.hoverlabel

Copy link
Contributor

Yeah I tried both and I'm seeing what I would expect.

Copy link
Contributor Author

Does that percy failure look familiar? A missing mapbox colorbar only on chrome? Should I resubmit the job?

Copy link
Contributor

let's rerun it yeah.

Copy link
Contributor Author

Ok, merging. Let's open new issues on the tab completion situation as they arise. Thanks all!

@jonmmease jonmmease merged commit 7fcb95c into master Apr 16, 2020
@nicolaskruchten nicolaskruchten added this to the 4.7.0 milestone Apr 27, 2020
@nicolaskruchten nicolaskruchten deleted the import_init_optimization branch June 19, 2020 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

1 more reviewer

@emmanuelle emmanuelle emmanuelle left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

4.7.0

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /