3
\$\begingroup\$

The goal is to transpose the excel file that comes in the following form

enter image description here

into this

enter image description here

The code I created works properly, but I think it can be improved.

I would also like to avoid using reduce to avoid importing functools.

#importing the libraries
import pandas as pd 
from functools import reduce
 
#defining the mapping list
descriptions_pizzeria = [
 "Pizzeria Da Michele - Napoli",
 "Pizzeria Da Michele - Roma",
 "Pizzeria Da Michele - Bologna",
 "Pizzeria Da Michele - Londra",
 ]
pizzeria_code = [
 1,
 2,
 3
 ]
#path where the data are stored
url_path = "https://github.com/uomodellamansarda/AC_StackQuestions/blob/main/Pizzerie.xlsx? raw=true" 
#reading the data
df = pd.read_excel(url_path, index_col=0, header=3)
#printing to have an idea
print(df)
#from an inspection we now that 
#the first column is where the Pizzeria code is stored
#but we need to rename it
#we need to rename the first column
df = df.rename(columns={"Unnamed: 1":"Pizzeria code"})
print(df.info())
#Grouping based on the pizzeria code
G = df.groupby('Pizzeria code')
pizzeria_df_list = []
for i, code in enumerate(pizzeria_code):
 #selecting the related Pizzeria commercial name
 description = descriptions_pizzeria[i]
 df_temp = G.get_group(code)
 #Dropping and trasposing the Pizzeria Code column
 df_temp = df_temp.drop(columns="Pizzeria code").transpose()
 #Based on what we know about the columns we can create the suffix
 col_suffix = ['Average Values', '# of customers ', 'Servings']
 #and rename the columns 
 df_temp.columns = [description + "_" + x for x in col_suffix]
 pizzeria_df_list.append(df_temp)
pizzeria_df = reduce(lambda x, y: pd.merge(x, y,right_index=True, left_index=True), pizzeria_df_list)
pizzeria_df.to_excel("Pizzeria_trasposed.xlsx")
tdy
2,2661 gold badge10 silver badges21 bronze badges
asked Nov 17, 2021 at 18:44
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

TL;DR

Instead of looping through each pizzeria and merging temporary subframes, it's more idiomatic to manipulate/map the columns at once.

The bulk of the work is just changing the original columns (left) to be filled/dropped/sorted/renamed (right):

 FROM THESE TO THESE 
|----------------------|-------------------------------------------------|
| Category Pizzeria | Category Pizzeria |
| Sales NaN | |
| NaN 1.0 | Average Values Pizzeria Da Michele - Napoli |
| NaN 2.0 | # of Customers Pizzeria Da Michele - Napoli |
| NaN 3.0 | Servings Pizzeria Da Michele - Napoli |
| People NaN | |
| NaN 1.0 | Average Values Pizzeria Da Michele - Roma |
| NaN 2.0 | # of Customers Pizzeria Da Michele - Roma |
| NaN 3.0 | Servings Pizzeria Da Michele - Roma |
| Dishes NaN | |
| NaN 1.0 | Average Values Pizzeria Da Michele - Bologna |
| NaN 2.0 | # of Customers Pizzeria Da Michele - Bologna |
| NaN 3.0 | Servings Pizzeria Da Michele - Bologna |
|----------------------|-------------------------------------------------|
df = pd.read_excel(url_path, header=3)
df = df.rename(columns={'Unnamed: 0': 'Category', 'Unnamed: 1': 'Pizzeria'})
# fill the null categories
df['Category'] = df['Category'].ffill()
# group up the pizzerias
df = df.dropna().sort_values('Pizzeria')
# map to the preferred pizzeria names
df['Pizzeria'] = df['Pizzeria'].map({
 1: 'Pizzeria Da Michele - Napoli',
 2: 'Pizzeria Da Michele - Roma',
 3: 'Pizzeria Da Michele - Bologna',
 4: 'Pizzeria Da Michele - Londra',
})
# map to the preferred category names
df['Category'] = df['Category'].map({
 'Dishes': 'Servings',
 'People': '# of Customers',
 'Sales': 'Average Values',
})
# set the combined pizzeria_category as the index
df = df.set_index(['Pizzeria', 'Category'])
df.index = [f'{p}_{c}' for p, c in df.index] # or ['_'.join(i) for i in df.index]
# transpose and output
df.T.to_excel('Pizzeria_transposed.xlsx')

Output:

 Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele -
 Napoli_Average Values Napoli_# of Customers Napoli_Servings Roma_Average Values Roma_# of Customers Roma_Servings Bologna_Average Values Bologna_# of Customers Bologna_Servings
2021年11月01日 00:00:00 100.0 70.0 10.0 300.0 200.0 30.0 50.0 20.0 5.0 
2021年11月08日 00:00:00 250.0 160.0 25.0 350.0 150.0 35.0 100.0 70.0 10.0 
2021年11月15日 00:00:00 100.0 70.0 10.0 500.0 300.0 50.0 250.0 160.0 25.0 
2021年11月22日 00:00:00 300.0 200.0 30.0 2000.0 1000.0 200.0 10.0 20.0 1.0 
2021年11月29日 00:00:00 400.0 250.0 40.0 1000.0 500.0 100.0 300.0 200.0 30.0 

Detailed breakdown

  1. Load the excel sheet without setting index_col because it will be easier to manipulate the Category as a column rather than index:

    df = pd.read_excel(url_path, header=3)
    df = df.rename(columns={'Unnamed: 0': 'Category', 'Unnamed: 1': 'Pizzeria'})
    # Category Pizzeria 2021年11月01日 00:00:00 2021年11月08日 00:00:00 2021年11月15日 00:00:00 2021年11月22日 00:00:00 2021年11月29日 00:00:00
    # 0 Sales NaN NaN NaN NaN NaN NaN
    # 1 NaN 1.0 100.0 250.0 100.0 300.0 400.0
    # 2 NaN 2.0 300.0 350.0 500.0 2000.0 1000.0
    # 3 NaN 3.0 50.0 100.0 250.0 10.0 300.0
    # 4 People NaN NaN NaN NaN NaN NaN
    # 5 NaN 1.0 70.0 160.0 70.0 200.0 250.0
    # 6 NaN 2.0 200.0 150.0 300.0 1000.0 500.0
    # 7 NaN 3.0 20.0 70.0 160.0 20.0 200.0
    # 8 Dishes NaN NaN NaN NaN NaN NaN
    # 9 NaN 1.0 10.0 25.0 10.0 30.0 40.0
    # 10 NaN 2.0 30.0 35.0 50.0 200.0 100.0
    # 11 NaN 3.0 5.0 10.0 25.0 1.0 30.0
    
  2. ffill (forward-fill) the null Category values and then sort by Pizzeria so that the pizzerias are grouped up:

    df['Category'] = df['Category'].ffill()
    df = df.dropna().sort_values('Pizzeria')
    # Category Pizzeria 2021年11月01日 00:00:00 2021年11月08日 00:00:00 2021年11月15日 00:00:00 2021年11月22日 00:00:00 2021年11月29日 00:00:00
    # 1 Sales 1.0 100.0 250.0 100.0 300.0 400.0
    # 5 People 1.0 70.0 160.0 70.0 200.0 250.0
    # 9 Dishes 1.0 10.0 25.0 10.0 30.0 40.0
    # 2 Sales 2.0 300.0 350.0 500.0 2000.0 1000.0
    # 6 People 2.0 200.0 150.0 300.0 1000.0 500.0
    # 10 Dishes 2.0 30.0 35.0 50.0 200.0 100.0
    # 3 Sales 3.0 50.0 100.0 250.0 10.0 300.0
    # 7 People 3.0 20.0 70.0 160.0 20.0 200.0
    # 11 Dishes 3.0 5.0 10.0 25.0 1.0 30.0
    
  3. map Pizzeria and Category to your preferred strings:

    df['Pizzeria'] = df['Pizzeria'].map({
     1: 'Pizzeria Da Michele - Napoli',
     2: 'Pizzeria Da Michele - Roma',
     3: 'Pizzeria Da Michele - Bologna',
     4: 'Pizzeria Da Michele - Londra',
    })
    df['Category'] = df['Category'].map({
     'Dishes': 'Servings',
     'People': '# of Customers',
     'Sales': 'Average Values',
    })
    # Category Pizzeria 2021年11月01日 00:00:00 2021年11月08日 00:00:00 2021年11月15日 00:00:00 2021年11月22日 00:00:00 2021年11月29日 00:00:00
    # 1 Average Values Pizzeria Da Michele - Napoli 100.0 250.0 100.0 300.0 400.0
    # 5 # of Customers Pizzeria Da Michele - Napoli 70.0 160.0 70.0 200.0 250.0
    # 9 Servings Pizzeria Da Michele - Napoli 10.0 25.0 10.0 30.0 40.0
    # 2 Average Values Pizzeria Da Michele - Roma 300.0 350.0 500.0 2000.0 1000.0
    # 6 # of Customers Pizzeria Da Michele - Roma 200.0 150.0 300.0 1000.0 500.0
    # 10 Servings Pizzeria Da Michele - Roma 30.0 35.0 50.0 200.0 100.0
    # 3 Average Values Pizzeria Da Michele - Bologna 50.0 100.0 250.0 10.0 300.0
    # 7 # of Customers Pizzeria Da Michele - Bologna 20.0 70.0 160.0 20.0 200.0
    # 11 Servings Pizzeria Da Michele - Bologna 5.0 10.0 25.0 1.0 30.0
    
  4. Join Pizzeria and Category inside the index (either explicitly unpack+concat the tuples or join them):

    df = df.set_index(['Pizzeria', 'Category'])
    df.index = [f'{p}_{c}' for p, c in df.index] # or ['_'.join(i) for i in df.index]
    # 2021年11月01日 00:00:00 2021年11月08日 00:00:00 2021年11月15日 00:00:00 2021年11月22日 00:00:00 2021年11月29日 00:00:00
    # Pizzeria Da Michele - Napoli_Average Values 100.0 250.0 100.0 300.0 400.0
    # Pizzeria Da Michele - Napoli_# of Customers 70.0 160.0 70.0 200.0 250.0
    # Pizzeria Da Michele - Napoli_Servings 10.0 25.0 10.0 30.0 40.0
    # Pizzeria Da Michele - Roma_Average Values 300.0 350.0 500.0 2000.0 1000.0
    # Pizzeria Da Michele - Roma_# of Customers 200.0 150.0 300.0 1000.0 500.0
    # Pizzeria Da Michele - Roma_Servings 30.0 35.0 50.0 200.0 100.0
    # Pizzeria Da Michele - Bologna_Average Values 50.0 100.0 250.0 10.0 300.0
    # Pizzeria Da Michele - Bologna_# of Customers 20.0 70.0 160.0 20.0 200.0
    # Pizzeria Da Michele - Bologna_Servings 5.0 10.0 25.0 1.0 30.0
    
  5. Finally transpose and save:

    df.T.to_excel('Pizzeria_transposed.xlsx')
    # Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele - Pizzeria Da Michele -
    # Napoli_Average Values Napoli_# of Customers Napoli_Servings Roma_Average Values Roma_# of Customers Roma_Servings Bologna_Average Values Bologna_# of Customers Bologna_Servings
    # 2021年11月01日 00:00:00 100.0 70.0 10.0 300.0 200.0 30.0 50.0 20.0 5.0 
    # 2021年11月08日 00:00:00 250.0 160.0 25.0 350.0 150.0 35.0 100.0 70.0 10.0 
    # 2021年11月15日 00:00:00 100.0 70.0 10.0 500.0 300.0 50.0 250.0 160.0 25.0 
    # 2021年11月22日 00:00:00 300.0 200.0 30.0 2000.0 1000.0 200.0 10.0 20.0 1.0 
    # 2021年11月29日 00:00:00 400.0 250.0 40.0 1000.0 500.0 100.0 300.0 200.0 30.0 
    
answered Nov 18, 2021 at 9:03
\$\endgroup\$
3
  • 2
    \$\begingroup\$ Nice explanation and definitely more pandas-idiomatic. I would suggest in step 3, dict(enumerate(descriptions_pizzeria, 1)) is maybe a bit too clever, and instead descriptions_pizzeria should be merged with pizzeria_code and become a simple dictionary or intenum, mapping location codes to strings. This way the correspondence is explicit so you don't end up accidentally adding a name to the wrong index in the list and making the auto mapping invalid, if that makes sense. Also more efficient at runtime to predefine this. \$\endgroup\$ Commented Nov 18, 2021 at 17:32
  • 2
    \$\begingroup\$ Also in section 4 df.index = ['_'.join(i) for i in df.index] maybe do a tuple unpacking to make it more explicit, e.g. df.index = [f"{pizzeria}_{sales_category}" for pizzeria, sales_category in df.index] - it's less performant than your approach but a lot clearer I think and this is not a performance bottleneck in the code so I'd take that trade-off :). Subjective though... \$\endgroup\$ Commented Nov 18, 2021 at 17:39
  • \$\begingroup\$ @Greedo Thanks for the feedback! Great points -- will update. \$\endgroup\$ Commented Nov 18, 2021 at 17:46

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.