Manipulating Pandas Dataframe with vaccination data from CSV to display on matplotlib

Question 1

I have some code that manipulates a Pandas Dataframe containing Covid-19 vaccine data and displays it on Matplotlib.

The data is here: https://covid.ourworldindata.org/data/owid-covid-data.csv (downloads CSV).

I have manipulated the data so that it only shows countries whose current vaccine per hundred rate is less than 10 (so it can't remove all vaccine rates less than ten, it has to go through each country, get the latest vaccine per hundred rate, and if it is less than ten, remove that country from the graph).

This is highly time-sensitive and needs to be done as quickly as possible.

Code:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, WeekdayLocator
import datetime
df = pd.read_csv(
 "https://covid.ourworldindata.org/data/owid-covid-data.csv", 
 usecols=["date", "location", "total_vaccinations_per_hundred"], 
 parse_dates=["date"])
df = df[df["total_vaccinations_per_hundred"].notna()]
countries = df["location"].unique().tolist()
countries_copy = countries.copy()
main_country = "United States"
for country in countries:
 if country in countries:
 df_with_current_country = df[df['location']==country]
 if df_with_current_country[df["date"]==df_with_current_country["date"].max()]["total_vaccinations_per_hundred"].tolist()[0] < 10:
 if country != main_country: countries_copy.remove(country)
countries = countries_copy
df = df[df["location"].isin(countries)]
pivot = pd.pivot_table(
 data=df, # What dataframe to use
 index="date", # The "rows" of your dataframe
 columns="location", # What values to show as columns
 values="total_vaccinations_per_hundred", # What values to aggregate
 aggfunc="mean", # How to aggregate data
)
pivot = pivot.fillna(method="ffill")
# Step 4: Plot all countries
fig, ax = plt.subplots(figsize=(12,8))
fig.patch.set_facecolor("#F5F5F5") # Change background color to a light grey
ax.patch.set_facecolor("#F5F5F5") # Change background color to a light grey
for country in countries:
 if country == main_country:
 country_color = "#129583"
 alpha_color = 1.0
 else:
 country_color = "grey"
 alpha_color = 0.75
 ax.plot(
 pivot.index, # What to use as your x-values
 pivot[country], # What to use as your y-values
 color=country_color, # How to color your line
 alpha=alpha_color # What transparency to use for your line
 )
 if country_color != "grey":
 ax.text(
 x = pivot.index[-1] + datetime.timedelta(days=2), # Where to position your text relative to the x-axis
 y = pivot[country].max(), # How high to position your text
 color = country_color, # What color to give your text
 s = country, # What to write
 alpha=alpha_color # What transparency to use
 )
# Step 5: Configures axes
## A) Format what shows up on axes and how it"s displayed 
date_form = DateFormatter("%Y-%m-%d")
ax.xaxis.set_major_locator(WeekdayLocator(byweekday=(0), interval=1))
ax.xaxis.set_major_formatter(date_form)
plt.xticks(rotation=45)
plt.ylim(0,100)
## B) Customizing axes and adding a grid
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["bottom"].set_color("#3f3f3f")
ax.spines["left"].set_color("#3f3f3f")
ax.tick_params(colors="#3f3f3f")
ax.grid(alpha=0.1)
## C) Adding a title and axis labels
plt.ylabel("Total Vaccinations per 100 People", fontsize=12, alpha=0.9)
plt.xlabel("Date", fontsize=12, alpha=0.9)
plt.title("COVID-19 Vaccinations over Time", fontsize=18, weight="bold", alpha=0.9)
# D) Celebrate!
plt.show()

Question 2

The loop and countries section can be replaced with a single DataFrameGroupBy.filter:

.groupby('location') - group by country
.sort_values('date') - sort country by date (newest at end)
.tail(1) >= 10 - only keep countries whose newest rate is at least 10
| (country.name == main_country) - always keep main_country

df2 = df.copy() # deep copy original df before loop (only to compare later)
df2 = df2.groupby('location').filter(lambda country:
 (country.sort_values('date').total_vaccinations_per_hundred.tail(1) >= 10)
 | (country.name == main_country)
)
# location date total_vaccinations_per_hundred
# 1930 Andorra 2021年01月25日 0.75
# 1937 Andorra 2021年02月01日 1.34
# ... ... ... ...
# 74507 Uruguay 2021年03月26日 14.31
# 74508 Uruguay 2021年03月27日 14.68
# 
# [3507 rows x 3 columns]

And if we run the countries section, we can verify that the filtered df2 matches the looped df:

df2.equals(df) # compare with df after loop
# True

Question 3

Nice answer; small comment: the original code kept main_country even if it had less than 10 vaccinations per 100.

Question 4

@RootTwo Ah good catch, fixed.

tdy tdy 2,2661 gold badge10 silver badges21 bronze badges · Accepted Answer · 2021-03-28 17:29:57Z

The loop and countries section can be replaced with a single DataFrameGroupBy.filter:

.groupby('location') - group by country
.sort_values('date') - sort country by date (newest at end)
.tail(1) >= 10 - only keep countries whose newest rate is at least 10
| (country.name == main_country) - always keep main_country

df2 = df.copy() # deep copy original df before loop (only to compare later)
df2 = df2.groupby('location').filter(lambda country:
 (country.sort_values('date').total_vaccinations_per_hundred.tail(1) >= 10)
 | (country.name == main_country)
)
# location date total_vaccinations_per_hundred
# 1930 Andorra 2021年01月25日 0.75
# 1937 Andorra 2021年02月01日 1.34
# ... ... ... ...
# 74507 Uruguay 2021年03月26日 14.31
# 74508 Uruguay 2021年03月27日 14.68
# 
# [3507 rows x 3 columns]

And if we run the countries section, we can verify that the filtered df2 matches the looped df:

df2.equals(df) # compare with df after loop
# True

Nice answer; small comment: the original code kept main_country even if it had less than 10 vaccinations per 100.

Stack Exchange Network

Manipulating Pandas Dataframe with vaccination data from CSV to display on matplotlib

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Manipulating Pandas Dataframe with vaccination data from CSV to display on matplotlib

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions