I am trying to concatenate the columns of my data frame if they have the same value in a field. I'll explain myself. Suppose I have this data frame:
peter brian peter mike brian
2 3 4 5 6
4 6 1 7 5
then I want to concatenate the columns taking into account values repeated in the first row:
peter brian mike
2 3 5
4 6 7
4 6
1 5
It is important to clarify that I cannot concatenate by directly calling the name ("peter", "mike", etc) since the data frame I want to use this in has thousands of columns. The idea is to automatically find the repeated names and concatenate them.
1 Answer 1
Here's one way using pd.concat
with NumPy arrays:
res = pd.concat([pd.Series(df[col].values.flatten(), name=col) \
for col in df.columns.unique()], axis=1)
print(res)
peter brian mike
0 2 3 5.0
1 4 6 7.0
2 4 6 NaN
3 1 5 NaN
Note mike
is forced to float
since it contains NaN
values.
Explanation
df.columns.unique
extracts unique column names.df[col].values.flatten
extracts the values from all series for a particular name as a NumPy array, and then flattens them into a 1-dimensional array.pd.Series
converts the array into a series object.- We iterate over all such unique column names via a list comprehension.
pd.concat
concatenates a list of series into a dataframe.
2 Comments
Explore related questions
See similar questions with these tags.