-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
harmonize ECDF between "h" and "v", removed "reversed" mode #3407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot one "." here...
Thanks for this PR! I'll be able to take a look at it next week, before the next release :)
Have a nice weekend!
One quick note is that I will not be able to merge this as-is given that it removes functionality. We still need to accept reversed
as an argument for backwards-compatibility. If you want to propose to make this equivalent to complementary
we'd need to implement and document that in the docstring :)
@nicolaskruchten, who would be able to perform the code review?
Hello @eisenlohr! I'm sorry for the long delays in responding here, I've gotten very far behind in my work! I will try to review this before the end of the year.
I do have some hesitations about the idea of merging reverse and complementary, as they don't really mean the same thing... from the docstring:
If 'standard', the ECDF is plotted such that values represent data at or below the point.
If 'complementary', the CCDF is plotted such that values represent data above the point.
If 'reversed', a variant of the CCDF is plotted such that values represent data at or above the point.
The complementary ECDF is 1-ECDF, so the definitions of standard and complementary are correct here. I added reverse mode because, indeed, there was a missing point.
As to the idea of adding an extra point to the series, I am also ambivalent, as this point is not in the underlying data, and will cause duplicated hoverlabels: the 'extra' point will have the same hover_data
as the point it was copied from, even though it's not a "real" data point.
Hi @nicolaskruchten,
Thanks for giving this some thought.
My MR addresses a couple of (somewhat independent) aspects.
-
I would expect the output of
h x
(orv x
) to be a transpose ofv y
(orh y
) when using the same input data. This was not the case in the current implementation (see diagrams above). -
The standard definition of cumulative probability is based on 'up to and including'. I think we are in violent agreement that what is called 'standard' and 'complementary' should be (and is) a mirror image. Where I deviate is in seeing a meaningful reason for modifying the classical definition and using 'up to but excluding', i.e. what was called 'reversed'. In my opinion, there is only one way to use and interpret the cumulative probability: if it is P for up to and including x, then it is 1-P for down to and excluding x. Therefore, no need for 'reversed'. Moreover, for the current (your) implementation, there is essentially no difference (except for the missing "closure" at either 1 or 0) between the choice of 'complementary' and 'reversed' in case of the (in my opinion principally correct)
v x
plot:
standard_vx
complementary_vx
reversed_vx -
Spanning the full range from zero to one appeared to be the usual way for ECDF plots. If one would be really pedantic, the plots should actually extend from -infty at zero cum probabilty to +infty at 1 cum probabilty, i.e. have horizontal (in the
standard v x
plot) "legs" at the bottom left and top right. Ignoring this detail, the only option that I was able to come up with was to add a copy of the lowest value at a cum prob of zero. In light of what I wrote above, the more appropriate choice might actually be -infty. However, then the plot would look lopsided, since only one of the mentioned "legs" is present... The cleanest look resulted from the choice I proposed in the MR. I agree that the hover information would be a copy, but am not certain whether this would actually be a serious concern?
In summary, I believe that most of the confusion arose due to "1." not being correct, i.e. the current implementation does not yield properly transposed images. Once this is fixed, then it becomes apparent that there is no difference anymore between plots that would result from 'complementary' compared to 'reversed'.
Hi Philip,
I sincerely apologize for letting this languish for so long... I've been a bit distracted by various life events recently (I've just started graduate school!) so I'm behind in Plotly.py work like reviewing PRs. I will definitely try to merge some or all of this PR in an upcoming release :)
Cheers,
Nicolas
Hi @nicolaskruchten, to lend some further support to the argument I made regarding item 3 above, the ECDF is plotted in the same way, i.e. spanning fully between 0 and 1, in Seaborn (see second-to-last example at https://seaborn.pydata.org/generated/seaborn.displot.html#seaborn.displot).
Hi @nicolaskruchten, I am trying another ping at this.
The Seaborn image I was referring to in my previous post is this:
@nicolaskruchten, there is another argument that I would like to stress: the cumulative probability for a single-valued population should be a step function at that value.
This is indeed reflected in the output of Seaborn, but not (yet) in plotly.express:
import pandas as pd import seaborn as sns import plotly.express as px df = pd.DataFrame(dict(x=[1])) sns.displot(data=df, x='x', kind='ecdf', ).savefig('ECDF_seaborn.png') px.ecdf(df, x='x', ).write_image('ECDF_plotlyexpress.png')
Seaborn output
Ploty.express output
Thank you for your persistence with this issue! I'm still quite embarassed at how long it has been open and how hard it has been for me to find time to properly consider it and I continue to apologize. Your arguments make sense, so my silence is not intended as passive disagreement!
@nicolaskruchten, no offense taken. I am aware that there is always more to do than hours in a day!
Uh oh!
There was an error while loading. Please reload this page.
Changes
px.ecdf
functionalityI changed three aspects of the ECDF functionality:
'complementary
').x=values,y=weights,orientation='v'
is the EXACT transpose ofy=values,x=weights,orientation='h'
.This symmetry was lacking in the former implementation as can be seen in the examples mentioned below.
'reversed'
option has been removed as it was essentially a copy of'complementary'
with the opposite end omitted.Example plots of original functionality:
original output
Example plots of updated functionality now exhibiting proper symmetry:
updated output
Above images were produced with
px
doc-stringsCode PR
plotly.graph_objects
, my modifications concern thecodegen
files and not generated files.modified existing tests.
new tutorial notebook (please see the doc checklist as well).