Calls many stored dataframes, change them and combine them

Question 1

I call many dataframes (using pickle), then I extract the interesting values of each one and finally combine them. Is there a better way to do it? For example with a for loop call them automatically instead of having to repeat the code many times?

#Dataframe 1
df_NB_E = pd.read_pickle('/Dataframes/Total Accuracy NB_E')
lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL'] 
lvl1 = ['10x20', '10x10', '10x5', '10x2', ] 
df_NB_E.index = pd.MultiIndex.from_product([lvl0, lvl1])
df_NB_E['Total Acc'] = df_NB_E.mean(axis=1).round(4)*100 
df_NB_E=df_NB_E.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
df_NB_E.rename(columns={'10x10':'NB_E'},inplace=True)
#Dataframe 2
df_LDA_E = pd.read_pickle('/Dataframes/Total Accuracy LDA_E')
lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL'] 
lvl1 = ['10x20', '10x10', '10x5', '10x2', ] 
df_LDA_E.index = pd.MultiIndex.from_product([lvl0, lvl1])
df_LDA_E['Total Acc'] = df_LDA_E.mean(axis=1).round(4)*100 
df_LDA_E=df_LDA_E.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
df_LDA_E.rename(columns={'10x10':'LDA_E'},inplace=True)
#Dataframe 3
df_DT_E = pd.read_pickle('/Dataframes/Total Accuracy DT_E')
lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL'] 
lvl1 = ['10x20', '10x10', '10x5', '10x2', ] 
df_DT_E.index = pd.MultiIndex.from_product([lvl0, lvl1]) 
df_DT_E['Total Acc'] = df_DT_E.mean(axis=1).round(4)*100 
df_DT_E=df_DT_E.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
df_DT_E.rename(columns={'10x10':'DT_E'},inplace=True)
#Dataframe 4
df_RF_E = pd.read_pickle('/Dataframes/Total Accuracy RF_E')
lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL'] 
lvl1 = ['10x20', '10x10', '10x5', '10x2', ] 
df_RF_E.index = pd.MultiIndex.from_product([lvl0, lvl1]) 
df_RF_E['Total Acc'] = df_RF_E.mean(axis=1).round(4)*100 
df_RF_E=df_RF_E.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
df_RF_E.rename(columns={'10x10':'RF_E'},inplace=True)
#Combining the 4 dataframes
pd.concat([df_NB_E, df_LDA_E, df_DT_E, df_RF_E], axis=1)

enter image description here

Question 2

There are definitely ways to simplify this code to avoid code repetition. For each of your dataframes, I noticed that you are doing essentially the same thing with only two differences:

the path to the frame
the custom name of each frame

This suggests that these blocks can be simplified into a single function with the two dynamic elements passed in as parameters like this.

def modifyFrame(frameName,frameLocation):
 df = pd.read_pickle(frameLocation)
 lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL'] 
 lvl1 = ['10x20', '10x10', '10x5', '10x2', ] 
 df.index = pd.MultiIndex.from_product([lvl0, lvl1])
 df['Total Acc'] = df.mean(axis=1).round(4)*100 
 df=df.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
 df.rename(columns={'10x10':frameName},inplace=True)
 return df

Now there are a few ways to simplify your code. One of the simplest ways would be to just call the function four times. This might be an option if you only have a few frames to modify, and the parameters for the frames don't change.

df_NB_E = modifyFrame('NB_E', '/Dataframes/Total Accuracy NB_E')
df_LDA_E = modifyFrame('LDA_E', '/Dataframes/Total Accuracy LDA_E')
df_DT_E = modifyFrame('DT_E' , '/Dataframes/Total Accuracy DT_E')
df_RF_E = modifyFrame('RF_E' , '/Dataframes/Total Accuracy RF_E')
pd.concat([df_NB_E, df_LDA_E, df_DT_E, df_RF_E], axis=1)

If you have a lot of frames, or if you need to modify the list of frames to be modified often, then I would suggest that you group the parameters together in a list. This makes it easier to loop through the frames. Grouping the parameters into a list like this would also make it easier to store the parameters in a separate location so you could change the frame information without changing the code. This reduces the chance of introducing errors into working code.

frames = [('NB_E' , '/Dataframes/Total Accuracy NB_E'),
 ('LDA_E', '/Dataframes/Total Accuracy LDA_E'),
 ('DT_E' , '/Dataframes/Total Accuracy DT_E'),
 ('RF_E' , '/Dataframes/Total Accuracy RF_E')]

You can use a few different methods to loop through the frames. First you can use a for loop like you mentiond.

modifiedFrames = []
for frame in frames:
 modifiedFrames.append(modifyFrame(frame[0],frame[1]))
pd.concat(modifiedFrames, axis=1)

However, one of the most powerful features of the python language is list comprehension. I suggest that you read more about it here.

Below is my solution to your problem using a function, a list containing the frame information, and a list comprehension.

def modifyFrame(frameName,frameLocation):
 df = pd.read_pickle(frameLocation)
 lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL'] 
 lvl1 = ['10x20', '10x10', '10x5', '10x2', ] 
 df.index = pd.MultiIndex.from_product([lvl0, lvl1])
 df['Total Acc'] = df.mean(axis=1).round(4)*100 
 df=df.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
 df.rename(columns={'10x10':frameName},inplace=True)
 return df
frames = [('NB_E' , '/Dataframes/Total Accuracy NB_E'),
 ('LDA_E', '/Dataframes/Total Accuracy LDA_E'),
 ('DT_E' , '/Dataframes/Total Accuracy DT_E'),
 ('RF_E' , '/Dataframes/Total Accuracy RF_E')]
#Combining the 4 dataframes
pd.concat([modifyFrame(frame[0],frame[1]) for frame in frames], axis=1)

Question 3

You don't need the [...] in the last code block. pd.concat can also take a generator expression. Also use tuple unpacking: pd.concat(modifyFrame(*frame) for frame in frames, axis=1)

Joel Johnson 963 bronze badges · Answer 1 · 2016-11-02 23:07:07Z

There are definitely ways to simplify this code to avoid code repetition. For each of your dataframes, I noticed that you are doing essentially the same thing with only two differences:

the path to the frame
the custom name of each frame

This suggests that these blocks can be simplified into a single function with the two dynamic elements passed in as parameters like this.

def modifyFrame(frameName,frameLocation):
 df = pd.read_pickle(frameLocation)
 lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL'] 
 lvl1 = ['10x20', '10x10', '10x5', '10x2', ] 
 df.index = pd.MultiIndex.from_product([lvl0, lvl1])
 df['Total Acc'] = df.mean(axis=1).round(4)*100 
 df=df.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
 df.rename(columns={'10x10':frameName},inplace=True)
 return df

Now there are a few ways to simplify your code. One of the simplest ways would be to just call the function four times. This might be an option if you only have a few frames to modify, and the parameters for the frames don't change.

df_NB_E = modifyFrame('NB_E', '/Dataframes/Total Accuracy NB_E')
df_LDA_E = modifyFrame('LDA_E', '/Dataframes/Total Accuracy LDA_E')
df_DT_E = modifyFrame('DT_E' , '/Dataframes/Total Accuracy DT_E')
df_RF_E = modifyFrame('RF_E' , '/Dataframes/Total Accuracy RF_E')
pd.concat([df_NB_E, df_LDA_E, df_DT_E, df_RF_E], axis=1)

If you have a lot of frames, or if you need to modify the list of frames to be modified often, then I would suggest that you group the parameters together in a list. This makes it easier to loop through the frames. Grouping the parameters into a list like this would also make it easier to store the parameters in a separate location so you could change the frame information without changing the code. This reduces the chance of introducing errors into working code.

frames = [('NB_E' , '/Dataframes/Total Accuracy NB_E'),
 ('LDA_E', '/Dataframes/Total Accuracy LDA_E'),
 ('DT_E' , '/Dataframes/Total Accuracy DT_E'),
 ('RF_E' , '/Dataframes/Total Accuracy RF_E')]

You can use a few different methods to loop through the frames. First you can use a for loop like you mentiond.

modifiedFrames = []
for frame in frames:
 modifiedFrames.append(modifyFrame(frame[0],frame[1]))
pd.concat(modifiedFrames, axis=1)

However, one of the most powerful features of the python language is list comprehension. I suggest that you read more about it here.

Below is my solution to your problem using a function, a list containing the frame information, and a list comprehension.

def modifyFrame(frameName,frameLocation):
 df = pd.read_pickle(frameLocation)
 lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL'] 
 lvl1 = ['10x20', '10x10', '10x5', '10x2', ] 
 df.index = pd.MultiIndex.from_product([lvl0, lvl1])
 df['Total Acc'] = df.mean(axis=1).round(4)*100 
 df=df.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
 df.rename(columns={'10x10':frameName},inplace=True)
 return df
frames = [('NB_E' , '/Dataframes/Total Accuracy NB_E'),
 ('LDA_E', '/Dataframes/Total Accuracy LDA_E'),
 ('DT_E' , '/Dataframes/Total Accuracy DT_E'),
 ('RF_E' , '/Dataframes/Total Accuracy RF_E')]
#Combining the 4 dataframes
pd.concat([modifyFrame(frame[0],frame[1]) for frame in frames], axis=1)

You don't need the [...] in the last code block. pd.concat can also take a generator expression. Also use tuple unpacking: pd.concat(modifyFrame(*frame) for frame in frames, axis=1)

Stack Exchange Network

Calls many stored dataframes, change them and combine them

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Calls many stored dataframes, change them and combine them

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions