3
\$\begingroup\$

I have some code that manipulates a Pandas Dataframe containing Covid-19 vaccine data and displays it on Matplotlib.

The data is here: https://covid.ourworldindata.org/data/owid-covid-data.csv (downloads CSV).

I have manipulated the data so that it only shows countries whose current vaccine per hundred rate is less than 10 (so it can't remove all vaccine rates less than ten, it has to go through each country, get the latest vaccine per hundred rate, and if it is less than ten, remove that country from the graph).

This is highly time-sensitive and needs to be done as quickly as possible.

Code:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, WeekdayLocator
import datetime
df = pd.read_csv(
 "https://covid.ourworldindata.org/data/owid-covid-data.csv", 
 usecols=["date", "location", "total_vaccinations_per_hundred"], 
 parse_dates=["date"])
df = df[df["total_vaccinations_per_hundred"].notna()]
countries = df["location"].unique().tolist()
countries_copy = countries.copy()
main_country = "United States"
for country in countries:
 if country in countries:
 df_with_current_country = df[df['location']==country]
 if df_with_current_country[df["date"]==df_with_current_country["date"].max()]["total_vaccinations_per_hundred"].tolist()[0] < 10:
 if country != main_country: countries_copy.remove(country)
countries = countries_copy
df = df[df["location"].isin(countries)]
pivot = pd.pivot_table(
 data=df, # What dataframe to use
 index="date", # The "rows" of your dataframe
 columns="location", # What values to show as columns
 values="total_vaccinations_per_hundred", # What values to aggregate
 aggfunc="mean", # How to aggregate data
)
pivot = pivot.fillna(method="ffill")
# Step 4: Plot all countries
fig, ax = plt.subplots(figsize=(12,8))
fig.patch.set_facecolor("#F5F5F5") # Change background color to a light grey
ax.patch.set_facecolor("#F5F5F5") # Change background color to a light grey
for country in countries:
 if country == main_country:
 country_color = "#129583"
 alpha_color = 1.0
 else:
 country_color = "grey"
 alpha_color = 0.75
 ax.plot(
 pivot.index, # What to use as your x-values
 pivot[country], # What to use as your y-values
 color=country_color, # How to color your line
 alpha=alpha_color # What transparency to use for your line
 )
 if country_color != "grey":
 ax.text(
 x = pivot.index[-1] + datetime.timedelta(days=2), # Where to position your text relative to the x-axis
 y = pivot[country].max(), # How high to position your text
 color = country_color, # What color to give your text
 s = country, # What to write
 alpha=alpha_color # What transparency to use
 )
# Step 5: Configures axes
## A) Format what shows up on axes and how it"s displayed 
date_form = DateFormatter("%Y-%m-%d")
ax.xaxis.set_major_locator(WeekdayLocator(byweekday=(0), interval=1))
ax.xaxis.set_major_formatter(date_form)
plt.xticks(rotation=45)
plt.ylim(0,100)
## B) Customizing axes and adding a grid
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["bottom"].set_color("#3f3f3f")
ax.spines["left"].set_color("#3f3f3f")
ax.tick_params(colors="#3f3f3f")
ax.grid(alpha=0.1)
## C) Adding a title and axis labels
plt.ylabel("Total Vaccinations per 100 People", fontsize=12, alpha=0.9)
plt.xlabel("Date", fontsize=12, alpha=0.9)
plt.title("COVID-19 Vaccinations over Time", fontsize=18, weight="bold", alpha=0.9)
# D) Celebrate!
plt.show()
Sᴀᴍ Onᴇᴌᴀ
29.5k16 gold badges45 silver badges201 bronze badges
asked Mar 24, 2021 at 19:12
\$\endgroup\$

1 Answer 1

2
+50
\$\begingroup\$

The loop and countries section can be replaced with a single DataFrameGroupBy.filter:

  • .groupby('location') - group by country
  • .sort_values('date') - sort country by date (newest at end)
  • .tail(1) >= 10 - only keep countries whose newest rate is at least 10
  • | (country.name == main_country) - always keep main_country
df2 = df.copy() # deep copy original df before loop (only to compare later)
df2 = df2.groupby('location').filter(lambda country:
 (country.sort_values('date').total_vaccinations_per_hundred.tail(1) >= 10)
 | (country.name == main_country)
)
# location date total_vaccinations_per_hundred
# 1930 Andorra 2021年01月25日 0.75
# 1937 Andorra 2021年02月01日 1.34
# ... ... ... ...
# 74507 Uruguay 2021年03月26日 14.31
# 74508 Uruguay 2021年03月27日 14.68
# 
# [3507 rows x 3 columns]

And if we run the countries section, we can verify that the filtered df2 matches the looped df:

df2.equals(df) # compare with df after loop
# True
answered Mar 28, 2021 at 17:29
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Nice answer; small comment: the original code kept main_country even if it had less than 10 vaccinations per 100. \$\endgroup\$ Commented Mar 29, 2021 at 4:06
  • \$\begingroup\$ @RootTwo Ah good catch, fixed. \$\endgroup\$ Commented Mar 29, 2021 at 4:28

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.