Python: Combining Two Rows with Pandas read_excel

Question 1

I am reading an Excel file using Pandas and I feel like there has to be a better way to handle the way I create column names. This is something like the Excel file I'm reading:

 1 2 # '1' is merged in the two cells above 'a'and 'b'
 Date a b c d # likewise for '2'. As opposed to 'centered across selection'
1 1-Jan-19 100 200 300 400
2 1-Feb-19 101 201 301 401
3 1-Mar-19 102 202 302 402

I want my to merge the 'a','b','c',and'd' columns heads with the '1'and '2' above them, so I'm doing the following to get my headers the way that I want:

import pandas as pd
import json
xls = pd.ExcelFile(r'C:\Path_to\Excel_Pandas_Connector_Test.xls')
df = pd.read_excel(xls, 'Sheet1', header=[1]) # uses the abcd row as column names
# I only want the most recent day of data so I do the following
json_str = df[df.Date == df['Date'].max()].to_json(orient='records',date_format='iso')
dat_data = json.loads(json_str)[0]
def clean_json():
 global dat_data
 dat_data['1a'] = dat_data.pop('a')
 dat_data['1b'] = dat_data.pop('b')
 dat_data['2c'] = dat_data.pop('c')
 dat_data['2d'] = dat_data.pop('d')
clean_json()
print(json.dumps(dat_data,indent=4))

My desired output is:

{
"Date": "2019-03-01T00:00:00.000Z",
"1a": 102,
"1b": 202,
"2c": 302,
"2d": 402
}

This works as written, but is there a Pandas built-in that I could have used to do the same thing instead of the clean_json function?

Question 2

Yes, there is an easier way, using pandas.Index.get_level_values.

First, I could only get your example dataframe when calling the read with df = pd.read_excel("/tmp/temp.xls", header=[0, 1]), so I get both headers correctly.

Then you can just do this:

import pandas as pd
import json
# read df
df = pd.read_excel("/tmp/temp.xls", header=[0, 1])
df.index = pd.to_datetime(df.index)
# combine multilevel columns to one level
df.columns = (pd.Series(df.columns.get_level_values(0)).apply(str)
 + pd.Series(df.columns.get_level_values(1)).apply(str))
# get Date as a column
df = df.reset_index()
df.columns = ["Date"] + list(df.columns[1:])
print(df)
# 1a 1b 2c 2d
# 2019年01月02日 100 200 300 400
# 2019年01月02日 101 201 301 401
# 2019年01月03日 102 202 302 402

After which you can just do something similar to what you are doing, but directly getting the index of the maximum instead of comparing all values to the value of the maximum:

json_data = json.loads(df.loc[df.Date.idxmax()].to_json(date_format='iso'))
print(json.dumps(json_data, indent=4))

Which produces the desired output:

{
 "Date": "2019-01-03T00:00:00.000Z",
 "1a": 102,
 "1b": 202,
 "2c": 302,
 "2d": 402
}

Question 3

Thanks, that's really concise and works well. I can see that there is a lot to learn in Pandas.

Graipher Graipher 41.6k7 gold badges70 silver badges134 bronze badges · Accepted Answer · 2019-04-12 08:28:37Z

Yes, there is an easier way, using pandas.Index.get_level_values.

First, I could only get your example dataframe when calling the read with df = pd.read_excel("/tmp/temp.xls", header=[0, 1]), so I get both headers correctly.

Then you can just do this:

import pandas as pd
import json
# read df
df = pd.read_excel("/tmp/temp.xls", header=[0, 1])
df.index = pd.to_datetime(df.index)
# combine multilevel columns to one level
df.columns = (pd.Series(df.columns.get_level_values(0)).apply(str)
 + pd.Series(df.columns.get_level_values(1)).apply(str))
# get Date as a column
df = df.reset_index()
df.columns = ["Date"] + list(df.columns[1:])
print(df)
# 1a 1b 2c 2d
# 2019年01月02日 100 200 300 400
# 2019年01月02日 101 201 301 401
# 2019年01月03日 102 202 302 402

After which you can just do something similar to what you are doing, but directly getting the index of the maximum instead of comparing all values to the value of the maximum:

json_data = json.loads(df.loc[df.Date.idxmax()].to_json(date_format='iso'))
print(json.dumps(json_data, indent=4))

Which produces the desired output:

{
 "Date": "2019-01-03T00:00:00.000Z",
 "1a": 102,
 "1b": 202,
 "2c": 302,
 "2d": 402
}

Thanks, that's really concise and works well. I can see that there is a lot to learn in Pandas.

Stack Exchange Network

Python: Combining Two Rows with Pandas read_excel

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python: Combining Two Rows with Pandas read_excel

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions