2
\$\begingroup\$

The code below allows me to determine what the most common main dish and the most common method of preparation for the most common main dish, for each US Region. It uses data obtained from 'thanksgiving-2015-poll-data.csv' which can be found on (GitHub).

I believe that a pivot_table might offer a more efficient method of getting the same information, but I can not figure out how to do so. Can anyone offer any insight? Here's the code I used to get this information which works but I feel is not the best (fastest) method for doing so.

import pandas as pd
data = pd.read_csv('thanksgiving-2015-poll-data.csv', encoding="Latin-1")
regions = data['US Region'].value_counts().keys()
main_dish = data['What is typically the main dish at your Thanksgiving dinner?']
main_dish_prep = data['How is the main dish typically cooked?']
regional_entire_meal_data_rows = []
for region in regions:
 is_in_region = data['US Region'] == region
 most_common_regional_dish = main_dish[is_in_region].value_counts().keys().tolist()[0]
 is_region_and_most_common_dish = (is_in_region) & (main_dish == most_common_regional_dish)
 most_common_regional_dish_prep_type = main_dish_prep[is_region_and_most_common_dish].value_counts().keys().tolist()[0]
 regional_entire_meal_data_rows.append((region, most_common_regional_dish, most_common_regional_dish_prep_type))
labels = ['US Region', 'Most Common Main Dish', 'Most Common Prep Type for Main Dish']
regional_main_dish_data = pd.DataFrame(regional_entire_meal_data_rows, columns=labels)
full_meal_message = '''\n\nThe table below shows a breakdown of the most common 
full Thanksgiving meal broken down by region.\n'''
print(full_meal_message)
print(regional_main_dish_data)
Stephen Rauch
4,31412 gold badges24 silver badges36 bronze badges
asked Jul 19, 2017 at 13:25
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

I have recast your loop, and the code is below. I will discuss a couple of points.

pandas.Dataframe.groupby() allows working with specific groups at a time

Your current code is working with the entire dataframe for each region. Pandas has the groupby to allow you to work with a specific regions data at one time. I don't know if it is any faster, but to my eye is easier to read.

desired_cols = [region_col, main_dish_col, main_dish_prep_col]
for region, group in df[desired_cols].groupby('US Region'):
 ....

Using pandas.Series

A pandas.Series is a data structure that is basically two vectors. One vector is the data, the other is the Index. In this code:

main_dish[is_in_region].value_counts().keys().tolist()[0]

.value_counts() returns a Series. You then ask for the keys(), turn that into a list and the take the first element. This is more naturally done by just taking the first elment of the index like:

.value_counts().index[0]

Main Loop Code:

df = pd.read_csv('thanksgiving-2015-poll-data.csv', encoding="Latin-1")
region_col = 'US Region'
main_dish_col = 'What is typically the main dish at your Thanksgiving dinner?'
main_dish_prep_col = 'How is the main dish typically cooked?'
desired_cols = [region_col, main_dish_col, main_dish_prep_col]
regional_entire_meal_data_rows = []
for region, group in df[desired_cols].groupby('US Region'):
 main_dish = group[main_dish_col]
 main_dish_prep = group[main_dish_prep_col]
 most_common_dish = main_dish.value_counts().index[0]
 prep_types = main_dish_prep[main_dish == most_common_dish]
 most_common_prep_type = prep_types.value_counts().index[0]
 regional_entire_meal_data_rows.append(
 (region, most_common_dish, most_common_prep_type))
answered Jul 19, 2017 at 16:43
\$\endgroup\$
1
  • \$\begingroup\$ Thanks, this is exactly the feed back I was looking for. At this point I consider myself an advanced beginner, and am always looking for "better" ways to write my python code. Note, you are right, both run at ~46 milliseconds. \$\endgroup\$ Commented Jul 20, 2017 at 3:52

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.