I'm after a pythonic and pandemic (from pandas, pun not intended =) way to pivot some rows in a dataframe into new columns.
My data has this format:
dof foo bar qux
idxA idxB
100 101 1 10 30 50
101 2 11 31 51
101 3 12 32 52
102 1 13 33 53
102 2 14 34 54
102 3 15 35 55
200 101 1 16 36 56
101 2 17 37 57
101 3 18 38 58
102 1 19 39 59
102 2 20 40 60
102 3 21 41 61
The variables foo, bar and qux actually have 3 dimensional coordinates, which I would like to call foo1, foo2, foo3, bar1, ..., qux3. These are identified by the column dof. Each row represents one axis in 3D, dof == 1 is the x axis, dof == 2 the y axis and dof == 3 is the z axis.
So, here is the final dataframe I want:
foo1 bar1 qux1 foo2 bar2 qux2 foo3 bar3 qux3
idxA idxB
100 101 10 30 50 11 31 51 12 32 52
102 13 33 53 14 34 54 15 35 55
200 101 16 36 56 17 37 57 18 38 58
102 19 39 59 20 40 60 21 41 61
Here is what I have done.
import pandas as pd
data = [[100, 101, 1, 10, 30, 50],
[100, 101, 2, 11, 31, 51],
[100, 101, 3, 12, 32, 52],
[100, 102, 1, 13, 33, 53],
[100, 102, 2, 14, 34, 54],
[100, 102, 3, 15, 35, 55],
[200, 101, 1, 16, 36, 56],
[200, 101, 2, 17, 37, 57],
[200, 101, 3, 18, 38, 58],
[200, 102, 1, 19, 39, 59],
[200, 102, 2, 20, 40, 60],
[200, 102, 3, 21, 41, 61],
]
df = pd.DataFrame(data=data, columns=['idxA', 'idxB', 'dof', 'foo', 'bar', 'qux'])
df.set_index(['idxA', 'idxB'], inplace=True)
#
# Here is where the magic happens - and I'm not too happy about this implementation
#
# Create an ampty dataframe with the same indexes
df2 = df[df.dof == 1].reset_index()[['idxA', 'idxB']]
df2.set_index(['idxA', 'idxB'], inplace=True)
# Loop through each DOF and add columns for `bar`, `foo` and `qux` manually.
for pivot in [1, 2, 3]:
df2.loc[:, 'foo%d' % pivot] = df[df.dof == pivot]['foo']
df2.loc[:, 'bar%d' % pivot] = df[df.dof == pivot]['bar']
df2.loc[:, 'qux%d' % pivot] = df[df.dof == pivot]['qux']
However I'm not too happy with these .loc calls and incremental column additions inside a loop. I thought that pandas being awesome as it is would have a neater way of doing that.
-
\$\begingroup\$ I'm too lazy to try an implementation, but perhaps you should look into a multi-index where the innermost index has size 3. \$\endgroup\$Reinderien– Reinderien2020年06月03日 01:55:12 +00:00Commented Jun 3, 2020 at 1:55
1 Answer 1
groupby
When iterating over the values in a column, it is bad practice to hardcode the values (for pivot in [1, 2, 3]). A better way would have been for pivot in df["dof"].unique(), but the best way is with DataFrame.groupby
To see what happens in the groupby, I try it first with an iteration, and printing the groups
for pivot, data in df.groupby("dof"):
print(pivot)
print(data)
Then I get to work with one of the data to mold it the way I want. In this case, we don't need the column dof any more, since we have it in the pivot variable, and we rename the columns using rename
for pivot, data in df.groupby("dof"):
print(pivot)
print(
data.drop(columns="dof").rename(
mapper={
column_name: f"{column_name}{pivot}"
for column_name in data.columns
},
axis=1,
)
)
Then we can use pd.concat to stitch it together
pd.concat(
[
data.drop(columns="dof").rename(
mapper={
column_name: f"{column_name}{pivot}"
for column_name in data.columns
},
axis=1,
)
for pivot, data in df.groupby("dof")
],
axis=1,
)
unstack
An alternative is with unstack:
From you description, dof is part of the index, so add it there. Then you can use DataFrame.unstack to bring it to the columns.
df2 = df.set_index("dof", append=True).unstack("dof")
foo foo foo bar bar bar qux qux qux
dof 1 2 3 1 2 3 1 2 3
idxA idxB
100 101 10 11 12 30 31 32 50 51 52
100 102 13 14 15 33 34 35 53 54 55
200 101 16 17 18 36 37 38 56 57 58
200 102 19 20 21 39 40 41 59 60 61
If you are okay with a MultiIndex, which will be handier then the concatenated strings in most cases, you can leave it at that. If you want it in the form you have it, you can do df2.columns = df2.columns.map(lambda x: f"{x[0]}{x[1]}").