How can I "concat" rows by same value in a column in Pandas?

Question 1

I would like to concat rows value in one row in a dataframe, given one column. Then I would like to receive an edited dataframe.

Input Data :

ID F_Name L_Name Address SSN Phone
123 Sam Doe 123 12345 111-111-1111
123 Sam Doe 123 12345 222-222-2222
123 Sam Doe abc345 12345 111-111-1111
123 Sam Doe abc345 12345 222-222-2222
456 Naveen Gupta 456 45678 333-333-3333
456 Manish Gupta 456 45678 333-333-3333

Expected Output Data :

myschema = {
"ID":"123"
"F_Name":"Sam"
"L_Name":"Doe"
"Addess":"[123, abc345]"
"Phone":"[111-111-1111,222-222-2222]"
"SSN":"12345"
}
{
"ID":"456"
"F_Name":"[Naveen, Manish]"
"L_Name":"Gupta"
"Addess":"456"
"Phone":"[333-333-333]"
"SSN":"45678"

}

Code Tried :

df = pd.read_csv('data.csv')
print(df)

Question 2

Please specify an expected output and the actual output you are getting. The 'code' you have tried is loading a dataframe and printing it. It does not relate to your question.

Question 3

@samarth I have edit my question. I have just loaded the df and i don't know how to achieve the output in pandas.

Question 4

note that if you are willing to have numpy arrays instead of lists, it's more concise to just aggregate pd.unique directly: myschema = df.groupby('ID', as_index=False).agg(pd.unique).to_dict(orient='records')

Question 5

try groupby()+agg():

myschema=(df.groupby('ID',as_index=False)
 .agg(lambda x:list(set(x))[0] if len(set(x))==1 else list(set(x))).to_dict('r'))

OR

If order is important then aggregrate pd.unique():

myschema=(df.groupby('ID',as_index=False)
 .agg(lambda x:pd.unique(x)[0] if len(pd.unique(x))==1 else pd.unique(x).tolist())
 .to_dict('r'))

so in the above code we are grouping the dataframe on 4 columns i.e ['ID','F_Name','L_Name','SSN'] then aggregrating the result and finding the unique values by aggregrating set and typecasting that set to a list and then converting the aggregrated result to list of dictionary and then selecting the value at 0th postion

output of myschema:

[{'ID': 123,
 'F_Name': 'Sam',
 'L_Name': 'Doe',
 'Address': ['abc345', '123'],
 'SSN': 12345,
 'Phone': ['222-222-2222', '111-111-1111']},
 {'ID': 456,
 'F_Name': ['Naveen', 'Manish'],
 'L_Name': 'Gupta',
 'Address': '456',
 'SSN': 45678,
 'Phone': '333-333-3333'}]

Question 6

code is working fine. Can you give some details about code that will be helpful.

Question 7

I was checking code for new id as well but it is not working for that. I have edited my input data. Do you have any suggestion onthat.

Question 8

@NaveenGupta added details..kindly have a look :)

Question 9

Thanku so much. Actually I need to grouping on Id column only. I have updated my input and output dataset. let me know if you have something to say on that.

Question 10

you can change the grouping to only have ID and remove the [0] at the end: df.groupby('ID', as_index=False).agg(lambda x: list(set(x))).to_dict('records')

Anurag Dabas Anurag Dabas 24.3k9 gold badges25 silver badges41 bronze badges · Accepted Answer · 2021-08-05 06:53:53Z

try groupby()+agg():

myschema=(df.groupby('ID',as_index=False)
 .agg(lambda x:list(set(x))[0] if len(set(x))==1 else list(set(x))).to_dict('r'))

OR

If order is important then aggregrate pd.unique():

myschema=(df.groupby('ID',as_index=False)
 .agg(lambda x:pd.unique(x)[0] if len(pd.unique(x))==1 else pd.unique(x).tolist())
 .to_dict('r'))

so in the above code we are grouping the dataframe on 4 columns i.e ['ID','F_Name','L_Name','SSN'] then aggregrating the result and finding the unique values by aggregrating set and typecasting that set to a list and then converting the aggregrated result to list of dictionary and then selecting the value at 0th postion

output of myschema:

[{'ID': 123,
 'F_Name': 'Sam',
 'L_Name': 'Doe',
 'Address': ['abc345', '123'],
 'SSN': 12345,
 'Phone': ['222-222-2222', '111-111-1111']},
 {'ID': 456,
 'F_Name': ['Naveen', 'Manish'],
 'L_Name': 'Gupta',
 'Address': '456',
 'SSN': 45678,
 'Phone': '333-333-3333'}]

code is working fine. Can you give some details about code that will be helpful.
I was checking code for new id as well but it is not working for that. I have edited my input data. Do you have any suggestion onthat.
Thanku so much. Actually I need to grouping on Id column only. I have updated my input and output dataset. let me know if you have something to say on that.
you can change the grouping to only have ID and remove the [0] at the end: df.groupby('ID', as_index=False).agg(lambda x: list(set(x))).to_dict('records')

CollectivesTM on Stack Overflow

How can I "concat" rows by same value in a column in Pandas?

1 Answer 1

12 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

12 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related