2

I want to use python to determine if the first instance of an ID value in the "Id" column has a match on a later row in that same column. If it does, then I want to take the value from the "Avail" column for the rows which match that initial "Id" value. Then I want to delete the rows with the duplicate Ids.

Here's my sample data: I have a CSV file that has data like this:

Id,First,Last,Avail 
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate 
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate 
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate 
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate 
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7 

Desired output (v1) (Please note that I don't care about the "First" or "Last" columns from the duplicate rows. I only care about the "Avail" data from those:

Id,First,Last,Avail 
abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate 
dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate 
gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate 
abcdefg,Nancy,Adams,3ec0c158-8782-41ff-8388-5a10b9261b60=immediate 
abcdefg,John,Smith,3ec0c158-8782-41ff-8388-c5dfe3b1276c=relative|7 

Then I'd like to delete the "duplicate" rows, leaving this:

Id,First,Last,Avail 
 abcdefg,John,Smith,4164667a-5dca-4ec6-a495-4be5b135d868=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate;3ec0c158-8782-41ff-8388-5a10b9261b60=immediate 
 dgasgas,Nancy,Adams,f98a8fbd-fb88-49b9-894e-631ba2a6f369=immediate 
 gaytrjhu,John,Smith,e24ddf4c-c79f-4a84-a4ed-d92a10cc9e15=immediate 
asked Apr 5, 2018 at 23:33
1
  • pandas groupby? Commented Apr 5, 2018 at 23:36

1 Answer 1

1
import pandas as pd
df = pd.DataFrame(data=[
 [1, 'John', 'Smith', 'a'],
 [1, 'John', 'Smith', 'b'],
 [2, 'Kate', 'Smith', 'c'],
 ],
 columns=['ID', 'First', 'Last', 'Avail']
)
output = (df
 .groupby(['ID', 'First', 'Last'], as_index=False)
 .agg({'Avail': lambda x: ';'.join(x)}))

You can use groupby as @Sphinx suggested. An example with the style of output you requested is above.

answered Apr 6, 2018 at 0:39

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.