0

I would like to concat rows value in one row in a dataframe, given one column. Then I would like to receive an edited dataframe.

Input Data :

ID F_Name L_Name Address SSN Phone
123 Sam Doe 123 12345 111-111-1111
123 Sam Doe 123 12345 222-222-2222
123 Sam Doe abc345 12345 111-111-1111
123 Sam Doe abc345 12345 222-222-2222
456 Naveen Gupta 456 45678 333-333-3333
456 Manish Gupta 456 45678 333-333-3333

Expected Output Data :

myschema = {
"ID":"123"
"F_Name":"Sam"
"L_Name":"Doe"
"Addess":"[123, abc345]"
"Phone":"[111-111-1111,222-222-2222]"
"SSN":"12345"
}
{
"ID":"456"
"F_Name":"[Naveen, Manish]"
"L_Name":"Gupta"
"Addess":"456"
"Phone":"[333-333-333]"
"SSN":"45678"

}

Code Tried :

df = pd.read_csv('data.csv')
print(df)
asked Aug 5, 2021 at 6:37
3
  • Please specify an expected output and the actual output you are getting. The 'code' you have tried is loading a dataframe and printing it. It does not relate to your question. Commented Aug 5, 2021 at 6:54
  • @samarth I have edit my question. I have just loaded the df and i don't know how to achieve the output in pandas. Commented Aug 5, 2021 at 7:09
  • note that if you are willing to have numpy arrays instead of lists, it's more concise to just aggregate pd.unique directly: myschema = df.groupby('ID', as_index=False).agg(pd.unique).to_dict(orient='records') Commented Aug 5, 2021 at 7:42

1 Answer 1

1

try groupby()+agg():

myschema=(df.groupby('ID',as_index=False)
 .agg(lambda x:list(set(x))[0] if len(set(x))==1 else list(set(x))).to_dict('r'))

OR

If order is important then aggregrate pd.unique():

myschema=(df.groupby('ID',as_index=False)
 .agg(lambda x:pd.unique(x)[0] if len(pd.unique(x))==1 else pd.unique(x).tolist())
 .to_dict('r'))

so in the above code we are grouping the dataframe on 4 columns i.e ['ID','F_Name','L_Name','SSN'] then aggregrating the result and finding the unique values by aggregrating set and typecasting that set to a list and then converting the aggregrated result to list of dictionary and then selecting the value at 0th postion

output of myschema:

[{'ID': 123,
 'F_Name': 'Sam',
 'L_Name': 'Doe',
 'Address': ['abc345', '123'],
 'SSN': 12345,
 'Phone': ['222-222-2222', '111-111-1111']},
 {'ID': 456,
 'F_Name': ['Naveen', 'Manish'],
 'L_Name': 'Gupta',
 'Address': '456',
 'SSN': 45678,
 'Phone': '333-333-3333'}]
answered Aug 5, 2021 at 6:53
Sign up to request clarification or add additional context in comments.

12 Comments

code is working fine. Can you give some details about code that will be helpful.
I was checking code for new id as well but it is not working for that. I have edited my input data. Do you have any suggestion onthat.
@NaveenGupta added details..kindly have a look :)
Thanku so much. Actually I need to grouping on Id column only. I have updated my input and output dataset. let me know if you have something to say on that.
you can change the grouping to only have ID and remove the [0] at the end: df.groupby('ID', as_index=False).agg(lambda x: list(set(x))).to_dict('records')
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.