Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit b8dfc84

Browse files
Merge pull request #5149 from rl-utility-man/swarm-plot
add swarm plot to the scatter documentation
2 parents 558016f + d1c1b85 commit b8dfc84

File tree

1 file changed

+138
-0
lines changed

1 file changed

+138
-0
lines changed

‎doc/python/line-and-scatter.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,144 @@ fig.update_traces(textposition="bottom right")
284284
fig.show()
285285
```
286286

287+
### Swarm (or Beeswarm) Plots
288+
289+
Swarm plots show the distribution of values in a column by giving each entry one dot and adjusting the y-value so that dots do not overlap and appear symmetrically around the y=0 line. They complement [histograms](https://plotly.com/python/histograms/), [box plots](https://plotly.com/python/box-plots/), and [violin plots](https://plotly.com/python/violin/). This example could be generalized to implement a swarm plot for multiple categories by adjusting the y-coordinate for each category.
290+
291+
```python
292+
import pandas as pd
293+
import plotly.express as px
294+
import collections
295+
296+
297+
def negative_1_if_count_is_odd(count):
298+
# if this is an odd numbered entry in its bin, make its y coordinate negative
299+
# the y coordinate of the first entry is 0, so entries 3, 5, and 7 get
300+
# negative y coordinates
301+
if count % 2 == 1:
302+
return -1
303+
else:
304+
return 1
305+
306+
307+
def swarm(
308+
X_series,
309+
fig_title,
310+
point_size=16,
311+
fig_width=800,
312+
gap_multiplier=1.2,
313+
bin_fraction=0.95, # slightly undersizes the bins to avoid collisions
314+
):
315+
# sorting will align columns in attractive c-shaped arcs rather than having
316+
# columns that vary unpredictably in the x-dimension.
317+
# We also exploit the fact that sorting means we see bins sequentially when
318+
# we add collision prevention offsets.
319+
X_series = X_series.copy().sort_values()
320+
321+
# we need to reason in terms of the marker size that is measured in px
322+
# so we need to think about each x-coordinate as being a fraction of the way from the
323+
# minimum X value to the maximum X value
324+
min_x = min(X_series)
325+
max_x = max(X_series)
326+
327+
list_of_rows = []
328+
# we will count the number of points in each "bin" / vertical strip of the graph
329+
# to be able to assign a y-coordinate that avoids overlapping
330+
bin_counter = collections.Counter()
331+
332+
for x_val in X_series:
333+
# assign this x_value to bin number
334+
# each bin is a vertical strip slightly narrower than one marker
335+
bin = (((fig_width*bin_fraction*(x_val-min_x))/(max_x-min_x)) // point_size)
336+
337+
# update the count of dots in that strip
338+
bin_counter.update([bin])
339+
340+
# remember the "y-slot" which tells us the number of points in this bin and is sufficient to compute the y coordinate unless there's a collision with the point to its left
341+
list_of_rows.append(
342+
{"x": x_val, "y_slot": bin_counter[bin], "bin": bin})
343+
344+
# iterate through the points and "offset" any that are colliding with a
345+
# point to their left apply the offsets to all subsequent points in the same bin.
346+
# this arranges points in an attractive swarm c-curve where the points
347+
# toward the edges are (weakly) further right.
348+
bin = 0
349+
offset = 0
350+
for row in list_of_rows:
351+
if bin != row["bin"]:
352+
# we have moved to a new bin, so we need to reset the offset
353+
bin = row["bin"]
354+
offset = 0
355+
# see if we need to "look left" to avoid a possible collision
356+
for other_row in list_of_rows:
357+
if (other_row["bin"] == bin-1):
358+
# "bubble" the entry up until we find a slot that avoids a collision
359+
while ((other_row["y_slot"] == row["y_slot"]+offset)
360+
and (((fig_width*(row["x"]-other_row["x"]))/(max_x-min_x)
361+
// point_size) < 1)):
362+
offset += 1
363+
# update the bin count so we know whether the number of
364+
# *used* slots is even or odd
365+
bin_counter.update([bin])
366+
367+
row["y_slot"] += offset
368+
# The collision free y coordinate gives the items in a vertical bin
369+
# y-coordinates to evenly spread their locations above and below the
370+
# y-axis (we'll make a correction below to deal with even numbers of
371+
# entries). For now, we'll assign 0, 1, -1, 2, -2, 3, -3 ... and so on.
372+
# We scale this by the point_size*gap_multiplier to get a y coordinate
373+
# in px.
374+
row["y"] = (row["y_slot"]//2) * \
375+
negative_1_if_count_is_odd(row["y_slot"])*point_size*gap_multiplier
376+
377+
# if the number of points is even, move y-coordinates down to put an equal
378+
# number of entries above and below the axis
379+
for row in list_of_rows:
380+
if bin_counter[row["bin"]] % 2 == 0:
381+
row["y"] -= point_size*gap_multiplier/2
382+
383+
df = pd.DataFrame(list_of_rows)
384+
# One way to make this code more flexible to e.g. handle multiple categories
385+
# would be to return a list of "swarmified" y coordinates here and then plot
386+
# outside the function.
387+
# That generalization would let you "swarmify" y coordinates for each
388+
# category and add category specific offsets to put the each category in its
389+
# own row
390+
391+
fig = px.scatter(
392+
df,
393+
x="x",
394+
y="y",
395+
title=fig_title,
396+
)
397+
# we want to suppress the y coordinate in the hover value because the
398+
# y-coordinate is irrelevant/misleading
399+
fig.update_traces(
400+
marker_size=point_size,
401+
# suppress the y coordinate because the y-coordinate is irrelevant
402+
hovertemplate="<b>value</b>: %{x}",
403+
)
404+
# we have to set the width and height because we aim to avoid icon collisions
405+
# and we specify the icon size in the same units as the width and height
406+
fig.update_layout(width=fig_width, height=(
407+
point_size*max(bin_counter.values())+200))
408+
fig.update_yaxes(
409+
showticklabels=False, # Turn off y-axis labels
410+
ticks='', # Remove the ticks
411+
title=""
412+
)
413+
return fig
414+
415+
416+
df = px.data.iris() # iris is a pandas DataFrame
417+
fig = swarm(df["sepal_length"], "Sepal length distribution from 150 iris samples")
418+
# The iris data set entries are rounded so there are no collisions.
419+
# a more interesting test case for collision avoidance is:
420+
# fig = swarm(pd.Series([1, 1.5, 1.78, 1.79, 1.85, 2,
421+
# 2, 2, 2, 3, 3, 2.05, 2.1, 2.2, 2.5, 12]))
422+
fig.show()
423+
```
424+
287425
## Scatter and line plots with go.Scatter
288426

289427
If Plotly Express does not provide a good starting point, it is possible to use [the more generic `go.Scatter` class from `plotly.graph_objects`](/python/graph-objects/). Whereas `plotly.express` has two functions `scatter` and `line`, `go.Scatter` can be used both for plotting points (makers) or lines, depending on the value of `mode`. The different options of `go.Scatter` are documented in its [reference page](https://plotly.com/python/reference/scatter/).

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /