pandas - concat columns by repeats values

Asked 7 years, 3 months ago

Viewed 1k times

I am trying to concatenate the columns of my data frame if they have the same value in a field. I'll explain myself. Suppose I have this data frame:

peter brian peter mike brian 
 2 3 4 5 6
 4 6 1 7 5

then I want to concatenate the columns taking into account values repeated in the first row:

peter brian mike
 2 3 5
 4 6 7
 4 6
 1 5

It is important to clarify that I cannot concatenate by directly calling the name ("peter", "mike", etc) since the data frame I want to use this in has thousands of columns. The idea is to automatically find the repeated names and concatenate them.

Improve this question

edited Jun 21, 2018 at 18:02

jpp's user avatar

jpp

165k37 gold badges301 silver badges361 bronze badges

asked Jun 21, 2018 at 14:42

ozo's user avatar

ozo ozo

9411 gold badge11 silver badges21 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default

Here's one way using pd.concat with NumPy arrays:

res = pd.concat([pd.Series(df[col].values.flatten(), name=col) \
 for col in df.columns.unique()], axis=1)
print(res)
 peter brian mike
0 2 3 5.0
1 4 6 7.0
2 4 6 NaN
3 1 5 NaN

Note mike is forced to float since it contains NaN values.

Explanation

df.columns.unique extracts unique column names.
df[col].values.flatten extracts the values from all series for a particular name as a NumPy array, and then flattens them into a 1-dimensional array.
pd.Series converts the array into a series object.
We iterate over all such unique column names via a list comprehension.
pd.concat concatenates a list of series into a dataframe.

Improve this answer

edited Jun 21, 2018 at 17:59

answered Jun 21, 2018 at 14:59

jpp's user avatar

jpp jpp

165k37 gold badges301 silver badges361 bronze badges

2 Comments

ozo

ozo Over a year ago

Can you explain the code a little bit, please? I'd also like to ask: what does it mean the "\" after of name = col and before the for?

2018年06月21日T17:24:44.73Z+00:00

jpp

jpp Over a year ago

Backslash is just a line continuation character in Python. I've also added an explanation.

2018年06月21日T17:59:57.03Z+00:00

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

pandas - concat columns by repeats values

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related