2
\$\begingroup\$

I have some data, which I've faked using:

def fake_disrete_data():
 in_li = []
 sample_points = 24 * 4
 for day, bias in zip((11, 12, 13), (.5, .7, 1.)):
 day_time = datetime(2016, 6, day, 0, 0, 0)
 for x in range(int(sample_points)):
 in_li.append((day_time + timedelta(minutes=15*x),
 int(x / 4),
 bias))
 return pd.DataFrame(in_li, columns=("time", "mag_sig", "bias")).set_index("time")
fake_disc = fake_disrete_data()

I can pivot each column individually using and then concatenate them using:

cols = list(fake_disc.columns.values)
dfs = []
for col in cols:
 dfs.append(pd.pivot_table(fake_disc,
 index=fake_disc.index.date,
 columns=fake_disc.index.hour,
 values=col,
 aggfunc=np.mean))
all_df = pd.concat(dfs, axis=1, keys=cols)

But is there a better way to do this?

I'm trying to follow the answers seen in Pandas pivot table for multiple columns at once and How to pivot multilabel table in pandas, but I'm having a difficult time translating their methods to the DateTimeIndex case.

asked Sep 17, 2018 at 14:53
\$\endgroup\$

1 Answer 1

2
+100
\$\begingroup\$

Review

all in all, this code is rather clean. I would use a generator comprehension and itertools.chain in fake_disrete_data instead of the nested for-loop, but that is a matter of taste

linewraps

I prefer to wrap lines after the ( instead of the first argument. Here I follow the same rules as black. This leads to lesser indents, but slightly longer code, for example:

dfs.append(
 pd.pivot_table(
 fake_disc,
 index=fake_disc.index.date,
 columns=fake_disc.index.hour,
 values=col,
 aggfunc=np.mean,
 )
)

list comprehension

instead of the appending, you can do

dfs = [
 pd.pivot_table(
 fake_disc,
 index=fake_disc.index.date,
 columns=fake_disc.index.hour,
 values=col,
 aggfunc=np.mean,
 )
 for col in columns
]

or even better, feed a dict to pd.concat, so you don't have to specify the keys argument

dfs = {
 col: pd.pivot_table(
 fake_disc,
 index=fake_disc.index.date,
 columns=fake_disc.index.hour,
 values=col,
 aggfunc='mean',
 )
 for col in fake_disc.columns
}
pd.concat(dfs, axis=1)

I also changed the np.mean to 'mean', so you don't have to specifically import np for this, and avoided having to create the columns list

Alternative approach

pd.pivot is a wrapper around unstack, groupby and stack. If you want to do something more complicated, you can do those operations by hand

fake_disc = fake_disrete_data()
fake_disc.columns = fake_disc.columns.set_names('variable')
df = fake_disc.stack().to_frame().rename(columns={0: 'value'})
df['hour'] = df.index.get_level_values('time').hour

this creates an intermediary DataFrame

time variable value hour
2016年06月11日 00:00:00 mag_sig 0.0 0
2016年06月11日 00:00:00 bias 0.5 0
2016年06月11日 00:15:00 mag_sig 0.0 0
2016年06月11日 00:15:00 bias 0.5 0
2016年06月11日 00:30:00 mag_sig 0.0 0
...

This you can group. To group the time per hour, you can use pd.Grouper

pivot = (
 df.groupby([pd.Grouper(level="time", freq="d"), "hour", "variable"])
 .mean()
 .unstack(["variable", "hour"])
 .sort_index(axis="columns", level=["variable", "hour"])
)
 value
variable bias ... mag_sig
hour 0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23
time 
2016年06月11日 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ... 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0
2016年06月12日 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 ... 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0
2016年06月13日 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ... 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0

performance

according to the %%timeit cell magic in Jupyterlab, the first approach (with the dict and concat) takes about 23ms, the second approach about 10ms. Depending on your usecase, this difference might be important. If it is not, pick the method which is most readable to your future self

answered Sep 25, 2018 at 8:16
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.