3
\$\begingroup\$

I am reading an Excel file using Pandas and I feel like there has to be a better way to handle the way I create column names. This is something like the Excel file I'm reading:

 1 2 # '1' is merged in the two cells above 'a'and 'b'
 Date a b c d # likewise for '2'. As opposed to 'centered across selection'
1 1-Jan-19 100 200 300 400
2 1-Feb-19 101 201 301 401
3 1-Mar-19 102 202 302 402

I want my to merge the 'a','b','c',and'd' columns heads with the '1'and '2' above them, so I'm doing the following to get my headers the way that I want:

import pandas as pd
import json
xls = pd.ExcelFile(r'C:\Path_to\Excel_Pandas_Connector_Test.xls')
df = pd.read_excel(xls, 'Sheet1', header=[1]) # uses the abcd row as column names
# I only want the most recent day of data so I do the following
json_str = df[df.Date == df['Date'].max()].to_json(orient='records',date_format='iso')
dat_data = json.loads(json_str)[0]
def clean_json():
 global dat_data
 dat_data['1a'] = dat_data.pop('a')
 dat_data['1b'] = dat_data.pop('b')
 dat_data['2c'] = dat_data.pop('c')
 dat_data['2d'] = dat_data.pop('d')
clean_json()
print(json.dumps(dat_data,indent=4))

My desired output is:

{
"Date": "2019-03-01T00:00:00.000Z",
"1a": 102,
"1b": 202,
"2c": 302,
"2d": 402
}

This works as written, but is there a Pandas built-in that I could have used to do the same thing instead of the clean_json function?

asked Apr 12, 2019 at 0:10
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Yes, there is an easier way, using pandas.Index.get_level_values.

First, I could only get your example dataframe when calling the read with df = pd.read_excel("/tmp/temp.xls", header=[0, 1]), so I get both headers correctly.

Then you can just do this:

import pandas as pd
import json
# read df
df = pd.read_excel("/tmp/temp.xls", header=[0, 1])
df.index = pd.to_datetime(df.index)
# combine multilevel columns to one level
df.columns = (pd.Series(df.columns.get_level_values(0)).apply(str)
 + pd.Series(df.columns.get_level_values(1)).apply(str))
# get Date as a column
df = df.reset_index()
df.columns = ["Date"] + list(df.columns[1:])
print(df)
# 1a 1b 2c 2d
# 2019年01月02日 100 200 300 400
# 2019年01月02日 101 201 301 401
# 2019年01月03日 102 202 302 402

After which you can just do something similar to what you are doing, but directly getting the index of the maximum instead of comparing all values to the value of the maximum:

json_data = json.loads(df.loc[df.Date.idxmax()].to_json(date_format='iso'))
print(json.dumps(json_data, indent=4))

Which produces the desired output:

{
 "Date": "2019-01-03T00:00:00.000Z",
 "1a": 102,
 "1b": 202,
 "2c": 302,
 "2d": 402
}
answered Apr 12, 2019 at 8:28
\$\endgroup\$
1
  • 1
    \$\begingroup\$ Thanks, that's really concise and works well. I can see that there is a lot to learn in Pandas. \$\endgroup\$ Commented Apr 12, 2019 at 12:24

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.