3

I am trying to combine rows of a dataframe in the event that there is a duplicate in one column. The dataframe looks like the following.

Name Code X Y
 A 123 10 11
 B 456 12 13
 C 123 15 16

I want to combine on Code. So if the Code is the same, combine the other data separated by a comma. The resulting df would look like this:

Name Code X Y
A,C 123 10,15 11,16
 B 456 12 13

My approach was the following:

 df = df.groupby(['Name','Code','Y'])['X'].astype(str).apply(', '.join).reset_index() 
 df = df.groupby(['Name','Code','X'])['Y'].astype(str).apply(', '.join).reset_index() 

I get the following error :

"Cannot access callable attribute 'astype' of 'SeriesGroupBy' objects, try using the 'apply' method"

I have been unable to figure out how to use apply to cast as type str, any tips?

asked Mar 12, 2019 at 14:16

3 Answers 3

5

Create index from Code column fo avoid casting to strings, then cast all columns and aggregate by index function join:

df = df.set_index('Code').astype(str).groupby(level=0).agg(', '.join).reset_index()
#pandas 0.24+
#df = df.set_index('Code').astype(str).groupby('Code').agg(', '.join).reset_index()
print (df)
 Code Name X Y
0 123 A, C 10, 15 11, 16
1 456 B 12 13
answered Mar 12, 2019 at 14:23

Comments

3

Another layout

(df.set_index('Code').astype(str)+',').sum(level=0).apply(lambda x : x.str.strip(','))
Out[50]: 
 Name X Y
Code 
123 A,C 10,15 11,16
456 B 12 13
answered Mar 12, 2019 at 14:26

6 Comments

Why would this cause a number like '0.007' to become '0.00699999999' after running it? For some reason every solution offered changes the numbers after combining them.
@MaxB that is the problem with pandas , if that is the case , format you float number first :-)
is there something I should do before calling this? Or attempt to round the numbers after? Rounding doesn't seem like it will solve my problem completely.
@MaxB I mean something like df['float col']=df['float col'].map('${:,.3f}'.format)
@MaxB let me show your the link . :-) stackoverflow.com/questions/20937538/…
|
1

More general solution:

import pandas as pd
df = pd.DataFrame([['A',123,10,11],['B',456,12,13],['C',123,15,16]],columns=['Name','Code','X','Y'])
def f(x):
 return ','.join(x)
df = df.astype(str).groupby('Code').agg(f)
df.index = [int(i) for i in df.index.tolist()] 

The last line changes the index from string back to integer type. I added this solution because it is easy to understand, however not the most elegant.

answered Mar 12, 2019 at 14:29

2 Comments

This is my original solution, problem is column Code is casted to strings
df.index = [int(i) for i in df.index.tolist()]

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.