use generators and comprehensions instead of lists #1791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

tweakimp wants to merge 8 commits into plotly:master from tweakimp:master

Closed

use generators and comprehensions instead of lists #1791

tweakimp wants to merge 8 commits into plotly:master from tweakimp:master

Conversation

@tweakimp

Copy link

@tweakimp tweakimp commented Sep 27, 2019

I tried to improve the speed by preventing the creation of lists that are not needed and the use of generators and comprehensions. In some cases this could mean significant speedups.

tweakimp added 3 commits

September 27, 2019 22:44

@tweakimp


 Use comprehension, prevent unnecessary lists, use more generators

6afdf59

@tweakimp


 Even more generators and less lists

6d2435b

@tweakimp


 Apply black, fix unique_index_vals

f1208fa

@tweakimp

Copy link

Author

tweakimp commented Sep 28, 2019

I dont know what goes wrong with the 2.7 tests... :(

tweakimp added 2 commits

October 3, 2019 12:38

@tweakimp


 Try to fix 2.7 bug

2fb6ed4

@tweakimp


 Problem was not resolved, go back.

90bb636

@nicolaskruchten

Copy link

Contributor

nicolaskruchten commented Oct 3, 2019

Thanks for the PR! I think there is something generally broken with that 2.7 test which I will try to fix.

In general are you able to quantify the performance gains here? Are there any cases where performance would degrade as a result of these changes?

@tweakimp

Copy link

Author

tweakimp commented Oct 3, 2019 •

edited

Loading

I dont think any change would cause the code to be slower, only potentially faster.
I can give you some simple examples.

Note that any is way faster because it can short cut when it found the first True.

import time
from contextlib import contextmanager
n = 10000000
@contextmanager
def time_this(label):
 """
 Fancy code block timer.
 """
 start = time.perf_counter_ns()
 try:
 yield
 finally:
 end = time.perf_counter_ns()
 print(f"TIME {label}: {(end-start)/1000000000} sec")
with time_this("append"):
 lst = []
 for x in range(n):
 if x % 2 == 0:
 lst.append(x)
with time_this("comprehension"):
 lst = [x for x in range(n) if x % 2 == 0]
with time_this("list call"):
 lst = list(x for x in range(n) if x % 2 == 0)
with time_this("any with list"):
 y = any([x % 10 == 0 for x in lst])
with time_this("any with generator"):
 y = any(x % 10 == 0 for x in lst)
with time_this("min with list"): 
 z = min([x % 10 == 0 for x in lst])
with time_this("min with generator"):
 z = min(x % 10 == 0 for x in lst)

Returns:

TIME append: 1.4062098 sec
TIME comprehension: 0.9040005 sec
TIME list call: 1.0192474 sec
TIME any with list: 0.4480044 sec
TIME any with generator: 7.6e-06 sec (!)
TIME min with list: 0.542667 sec
TIME min with generator: 0.518529 sec

@nicolaskruchten

Copy link

Contributor

nicolaskruchten commented Oct 3, 2019

Yep, the principle makes sense, but I'm interested in seeing what real-world applications of plotly are sped by this PR... Do you have any examples of figures that are slow to generate today and faster as a result of this PR?

@tweakimp


 Even more list comprehensions.

b6726fa

@tweakimp

Copy link

Author

tweakimp commented Oct 3, 2019

Yep, the principle makes sense, but I'm interested in seeing what real-world applications of plotly are sped by this PR... Do you have any examples of figures that are slow to generate today and faster as a result of this PR?

No, what would be the way to test this?

@nicolaskruchten

Copy link

Contributor

nicolaskruchten commented Oct 3, 2019

Similar to your code snippet above, just using plotly instead of synthetic examples :)

@tweakimp

Copy link

Author

tweakimp commented Oct 3, 2019 •

edited

Loading

I dont know what part of plotly I improved. Think of this more as a cleaner code PR with potential speedups instead of a performance PR.
I dont see what is called where, what is called how many times and where the speedups should be.

If you tell me what kind of plot to create, I can test this.

@nicolaskruchten

Copy link

Contributor

nicolaskruchten commented Oct 4, 2019

OK. While I thank you for your work and interest in maintaining our codebase, I am reticent to merge this change, because some of the parts of the code this PR touches are not all that well-tested, and most of the changes you are proposing are refactoring away lists which in basically every case have 10-100 elements, so the absolute speedups in real-world use-cases are likely to be very small. As a result, the potential benefits of this PR don't really outweigh the risks of merging it from my perspective today. Re testing, I should note that I've resolved the previous issue with the 2.7 tests and the current failure is likely due to changes from this PR.

If you're able to point to specific figures (say some of the sample figures in our documentation at https://plot.ly/python) and find significant speedups this might change the conversation, however.

@tweakimp


 Fix UnboundLocalError of wdw_size

a64ab87

@tweakimp

Copy link

Author

tweakimp commented Oct 4, 2019

Your fix of the 2.7 tests made it possible for me to fix an actual bug, it all works now.

@sursu

Copy link

Contributor

sursu commented Oct 4, 2019 •

edited

Loading

@nicolaskruchten I think it's easier to look at functions being changed, than at plots.
For instance, take the endpoints_to_interval function which is changed here (last).

It is easy to see that that function is now cleaner. And it happens to be faster (but, of course, we are talking about microseconds here).

I would suggest modifying it to:

 # add -inf to intervals
 intervals = [[float("-inf"), endpts[0]]]
 for k in range(length - 1):
 intervals.append([endpts[k],endpts[k + 1]])

sursu

sursu suggested changes

Oct 4, 2019

View reviewed changes

packages/python/plotly/plotly/figure_factory/_scatterplot.py Outdated

Comment on lines 57 to 60

# add -inf to intervals

intervals = [[float("-inf"), endpts[0]]]

for k in range(length - 1):

interval = []

interval.append(endpts[k])

interval.append(endpts[k + 1])

intervals.append(interval)

intervals.append([endpts[k],endpts[k + 1]])

Copy link

Contributor

@sursu sursu Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed change:

 # add -inf to intervals
 intervals = [[float("-inf"), endpts[0]]]
 for k in range(length - 1):
 intervals.append([endpts[k], endpts[k + 1]])

Copy link

Author

@tweakimp tweakimp Oct 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You looked at outdated code, it is already changed in the current version of the pull request.

sursu

sursu reviewed

Oct 4, 2019

View reviewed changes

packages/python/chart-studio/chart_studio/presentation_objs/presentation_objs.py

if slide.count("```") % 2 != 0:

raise _plotly_utils.exceptions.PlotlyError(CODE_ENV_ERROR)

wdw_size = len("```")

Copy link

Contributor

@sursu sursu Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply set wdw_size = 3? or why even create this variable, it's being used only in one place?

Copy link

Author

@tweakimp tweakimp Oct 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to change as little as possible. There might be a reason (readability, further use in the future) why this variable was there before.

sursu

sursu reviewed

Oct 4, 2019

View reviewed changes

packages/python/chart-studio/chart_studio/grid_objs/grid_objs.py Outdated

Comment on lines 202 to 200

all_orders = []

for column_name in columns_or_json["cols"].keys():

all_orders.append(columns_or_json["cols"][column_name]["order"])

all_orders = [columns_or_json["cols"][column_name]["order"] for column_name in columns_or_json["cols"].keys()]

Copy link

Contributor

@sursu sursu Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified to:

all_orders = [D['order'] for D in columns_or_json["cols"].values()]

but, I'm not sure what columns_or_json looks like. I guess it's a dictionary.

Copy link

Author

@tweakimp tweakimp Oct 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Please dont mix single and double quotes :)

Copy link

Author

@tweakimp tweakimp Oct 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I changed this there and below.

@nicolaskruchten

Copy link

Contributor

nicolaskruchten commented Oct 4, 2019

but, of course, we are talking about microseconds here

So this is kind of my concern... It doesn't seem to be worth the risk of introducing a new bug in reasonably-working, marginally-tested code for this kind of improvement, either in performance or in style.

@tweakimp


 Loop over values of dictionary instead of keys and then getting the v...

45a9270

...alues

@nckswt

Copy link

nckswt commented Oct 17, 2019 •

edited

Loading

@nicolaskruchten I stumbled across this PR when trying to solve my own problem. It's quite possible that Plot.ly animations isn't the right tool for this, but I wanted to animate a scatterplot of 200 points for 1000+ frames. I used this page as a starting point.

Having to pre-generate the frames means I have to wait for nearly a minute before my graph shows up. Is there some other way to generate frames during an animation, instead of pre-generating all frames before the animation? Passing a generator to frames would be the ideal solution, but I didn't see any reference to it here.

@SankarGaneshb

Copy link

SankarGaneshb commented Mar 24, 2024

@tweakimp I like this pull request for the right clean up of existing code. Idea for performance with very big data.

@nicolaskruchten True, the impact it is going to make is very minimal and more over the Chart Studio package is moved to different source this pull request might not need.

@nckswt Even As per @tweakimp the impact for millisecond improvement is for the count of 10000000, but as such you were talking about 200000+ alone so this change might not be of great help.

This pull request can be marked as not requried further.

Thanks.

@gvwilson gvwilson self-assigned this

May 27, 2024

@gvwilson gvwilson removed their assignment

Aug 2, 2024

@gvwilson gvwilson changed the title ~~(削除) Dont create unnecessary lists, use more generators and comprehensions (削除ここまで)~~ (追記) use generators and comprehensions instead of lists (追記ここまで)

Aug 12, 2024

@gvwilson gvwilson added feature community P2 labels

Aug 12, 2024

@gvwilson gvwilson added the performance label

Aug 12, 2024

@marthacryan

Copy link

Collaborator

marthacryan commented Aug 19, 2024

Hey @tweakimp! After reading through this discussion, I think I'll close this. It seems like it would be a good idea to use these more, but it might not be worth the risks associated with changing this much logic. Thank you for this PR though!

@marthacryan marthacryan closed this

Aug 19, 2024

Labels

community feature P2 performance

7 participants

@tweakimp @nicolaskruchten @sursu @nckswt @SankarGaneshb @marthacryan @gvwilson

Uh oh!

Uh oh!

use generators and comprehensions instead of lists #1791

use generators and comprehensions instead of lists #1791

Uh oh!

Conversation

@tweakimp tweakimp commented Sep 27, 2019

Uh oh!

tweakimp commented Sep 28, 2019

Uh oh!

nicolaskruchten commented Oct 3, 2019

Uh oh!

tweakimp commented Oct 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolaskruchten commented Oct 3, 2019

Uh oh!

tweakimp commented Oct 3, 2019

Uh oh!

nicolaskruchten commented Oct 3, 2019

Uh oh!

tweakimp commented Oct 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolaskruchten commented Oct 4, 2019

Uh oh!

tweakimp commented Oct 4, 2019

Uh oh!

sursu commented Oct 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

@sursu sursu Oct 4, 2019

Choose a reason for hiding this comment

Uh oh!

@tweakimp tweakimp Oct 5, 2019

Choose a reason for hiding this comment

Uh oh!

@sursu sursu Oct 4, 2019

Choose a reason for hiding this comment

Uh oh!

@tweakimp tweakimp Oct 5, 2019

Choose a reason for hiding this comment

Uh oh!

@sursu sursu Oct 4, 2019

Choose a reason for hiding this comment

Uh oh!

@tweakimp tweakimp Oct 5, 2019

Choose a reason for hiding this comment

Uh oh!

@tweakimp tweakimp Oct 5, 2019

Choose a reason for hiding this comment

Uh oh!

nicolaskruchten commented Oct 4, 2019

Uh oh!

nckswt commented Oct 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SankarGaneshb commented Mar 24, 2024

Uh oh!

marthacryan commented Aug 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

tweakimp commented Oct 3, 2019 •

edited

Loading

tweakimp commented Oct 3, 2019 •

edited

Loading

sursu commented Oct 4, 2019 •

edited

Loading

nckswt commented Oct 17, 2019 •

edited

Loading