I would like to concat rows value in one row in a dataframe, given one column. Then I would like to receive an edited dataframe.
Input Data :
ID F_Name L_Name Address SSN Phone
123 Sam Doe 123 12345 111-111-1111
123 Sam Doe 123 12345 222-222-2222
123 Sam Doe abc345 12345 111-111-1111
123 Sam Doe abc345 12345 222-222-2222
456 Naveen Gupta 456 45678 333-333-3333
456 Manish Gupta 456 45678 333-333-3333
Expected Output Data :
myschema = {
"ID":"123"
"F_Name":"Sam"
"L_Name":"Doe"
"Addess":"[123, abc345]"
"Phone":"[111-111-1111,222-222-2222]"
"SSN":"12345"
}
{
"ID":"456"
"F_Name":"[Naveen, Manish]"
"L_Name":"Gupta"
"Addess":"456"
"Phone":"[333-333-333]"
"SSN":"45678"
}
Code Tried :
df = pd.read_csv('data.csv')
print(df)
1 Answer 1
try groupby()
+agg()
:
myschema=(df.groupby('ID',as_index=False)
.agg(lambda x:list(set(x))[0] if len(set(x))==1 else list(set(x))).to_dict('r'))
OR
If order is important then aggregrate pd.unique()
:
myschema=(df.groupby('ID',as_index=False)
.agg(lambda x:pd.unique(x)[0] if len(pd.unique(x))==1 else pd.unique(x).tolist())
.to_dict('r'))
so in the above code we are grouping the dataframe on 4 columns i.e ['ID','F_Name','L_Name','SSN']
then aggregrating the result and finding the unique values by aggregrating set and typecasting that set to a list and then converting the aggregrated result to list of dictionary and then selecting the value at 0th postion
output of myschema
:
[{'ID': 123,
'F_Name': 'Sam',
'L_Name': 'Doe',
'Address': ['abc345', '123'],
'SSN': 12345,
'Phone': ['222-222-2222', '111-111-1111']},
{'ID': 456,
'F_Name': ['Naveen', 'Manish'],
'L_Name': 'Gupta',
'Address': '456',
'SSN': 45678,
'Phone': '333-333-3333'}]
12 Comments
ID
and remove the [0]
at the end: df.groupby('ID', as_index=False).agg(lambda x: list(set(x))).to_dict('records')
Explore related questions
See similar questions with these tags.
pd.unique
directly:myschema = df.groupby('ID', as_index=False).agg(pd.unique).to_dict(orient='records')