I call many dataframes (using pickle), then I extract the interesting values of each one and finally combine them. Is there a better way to do it? For example with a for loop call them automatically instead of having to repeat the code many times?
#Dataframe 1
df_NB_E = pd.read_pickle('/Dataframes/Total Accuracy NB_E')
lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL']
lvl1 = ['10x20', '10x10', '10x5', '10x2', ]
df_NB_E.index = pd.MultiIndex.from_product([lvl0, lvl1])
df_NB_E['Total Acc'] = df_NB_E.mean(axis=1).round(4)*100
df_NB_E=df_NB_E.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
df_NB_E.rename(columns={'10x10':'NB_E'},inplace=True)
#Dataframe 2
df_LDA_E = pd.read_pickle('/Dataframes/Total Accuracy LDA_E')
lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL']
lvl1 = ['10x20', '10x10', '10x5', '10x2', ]
df_LDA_E.index = pd.MultiIndex.from_product([lvl0, lvl1])
df_LDA_E['Total Acc'] = df_LDA_E.mean(axis=1).round(4)*100
df_LDA_E=df_LDA_E.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
df_LDA_E.rename(columns={'10x10':'LDA_E'},inplace=True)
#Dataframe 3
df_DT_E = pd.read_pickle('/Dataframes/Total Accuracy DT_E')
lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL']
lvl1 = ['10x20', '10x10', '10x5', '10x2', ]
df_DT_E.index = pd.MultiIndex.from_product([lvl0, lvl1])
df_DT_E['Total Acc'] = df_DT_E.mean(axis=1).round(4)*100
df_DT_E=df_DT_E.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
df_DT_E.rename(columns={'10x10':'DT_E'},inplace=True)
#Dataframe 4
df_RF_E = pd.read_pickle('/Dataframes/Total Accuracy RF_E')
lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL']
lvl1 = ['10x20', '10x10', '10x5', '10x2', ]
df_RF_E.index = pd.MultiIndex.from_product([lvl0, lvl1])
df_RF_E['Total Acc'] = df_RF_E.mean(axis=1).round(4)*100
df_RF_E=df_RF_E.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
df_RF_E.rename(columns={'10x10':'RF_E'},inplace=True)
#Combining the 4 dataframes
pd.concat([df_NB_E, df_LDA_E, df_DT_E, df_RF_E], axis=1)
1 Answer 1
There are definitely ways to simplify this code to avoid code repetition. For each of your dataframes, I noticed that you are doing essentially the same thing with only two differences:
- the path to the frame
- the custom name of each frame
This suggests that these blocks can be simplified into a single function with the two dynamic elements passed in as parameters like this.
def modifyFrame(frameName,frameLocation):
df = pd.read_pickle(frameLocation)
lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL']
lvl1 = ['10x20', '10x10', '10x5', '10x2', ]
df.index = pd.MultiIndex.from_product([lvl0, lvl1])
df['Total Acc'] = df.mean(axis=1).round(4)*100
df=df.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
df.rename(columns={'10x10':frameName},inplace=True)
return df
Now there are a few ways to simplify your code. One of the simplest ways would be to just call the function four times. This might be an option if you only have a few frames to modify, and the parameters for the frames don't change.
df_NB_E = modifyFrame('NB_E', '/Dataframes/Total Accuracy NB_E')
df_LDA_E = modifyFrame('LDA_E', '/Dataframes/Total Accuracy LDA_E')
df_DT_E = modifyFrame('DT_E' , '/Dataframes/Total Accuracy DT_E')
df_RF_E = modifyFrame('RF_E' , '/Dataframes/Total Accuracy RF_E')
pd.concat([df_NB_E, df_LDA_E, df_DT_E, df_RF_E], axis=1)
If you have a lot of frames, or if you need to modify the list of frames to be modified often, then I would suggest that you group the parameters together in a list. This makes it easier to loop through the frames. Grouping the parameters into a list like this would also make it easier to store the parameters in a separate location so you could change the frame information without changing the code. This reduces the chance of introducing errors into working code.
frames = [('NB_E' , '/Dataframes/Total Accuracy NB_E'),
('LDA_E', '/Dataframes/Total Accuracy LDA_E'),
('DT_E' , '/Dataframes/Total Accuracy DT_E'),
('RF_E' , '/Dataframes/Total Accuracy RF_E')]
You can use a few different methods to loop through the frames. First you can use a for loop like you mentiond.
modifiedFrames = []
for frame in frames:
modifiedFrames.append(modifyFrame(frame[0],frame[1]))
pd.concat(modifiedFrames, axis=1)
However, one of the most powerful features of the python language is list comprehension. I suggest that you read more about it here.
Below is my solution to your problem using a function, a list containing the frame information, and a list comprehension.
def modifyFrame(frameName,frameLocation):
df = pd.read_pickle(frameLocation)
lvl0 = ['AWA', 'REM', 'S1', 'S2', 'SWS', 'ALL']
lvl1 = ['10x20', '10x10', '10x5', '10x2', ]
df.index = pd.MultiIndex.from_product([lvl0, lvl1])
df['Total Acc'] = df.mean(axis=1).round(4)*100
df=df.reset_index().pivot('level_1', 'level_0', 'Total Acc').loc['10x10'].reset_index()
df.rename(columns={'10x10':frameName},inplace=True)
return df
frames = [('NB_E' , '/Dataframes/Total Accuracy NB_E'),
('LDA_E', '/Dataframes/Total Accuracy LDA_E'),
('DT_E' , '/Dataframes/Total Accuracy DT_E'),
('RF_E' , '/Dataframes/Total Accuracy RF_E')]
#Combining the 4 dataframes
pd.concat([modifyFrame(frame[0],frame[1]) for frame in frames], axis=1)
-
\$\begingroup\$ You don't need the
[...]in the last code block.pd.concatcan also take a generator expression. Also use tuple unpacking:pd.concat(modifyFrame(*frame) for frame in frames, axis=1)\$\endgroup\$Graipher– Graipher2016年11月03日 11:44:17 +00:00Commented Nov 3, 2016 at 11:44