I have code that loops through a column of a dataframe. How can I make this code faster? (Python/Pandas)

Asked 4 years, 7 months ago

Viewed 85 times

\$\begingroup\$

I have datasetA with 90,000 rows and datasetB with 5,000 rows. Each dataset has a column called "ID" with employee IDs. My goal is to to create another column in datasetA that identifies whether the employee ID in datasetA is also in datasetB with a True/False. Additionally, there are most likely some multiples for certain employees/employee ids in both datasets. I am fairly certain that the code I wrote works, but it is way too slow, and I was wondering what I could change to make it faster? Thanks!

#Creating the new column to identify whether the ID in datasetA is also in datasetB.
datasetA["inB"] = "Empty"
# Looping through
for id_num in datasetA["ID"]:
 filt = (datasetA["ID"] == id_num)
 if (datasetB["ID"] == id_num).any():
 datasetA.loc[filt, "inB"] = True
 else:
 datasetA.loc[filt, "inB"] = False
```

asked Feb 13, 2021 at 20:09

Nick 's user avatar

Nick Nick

1093 bronze badges

\$\endgroup\$

1

\$\begingroup\$ You can do that with an inner join. pandas.pydata.org/pandas-docs/stable/reference/api/… \$\endgroup\$

Tweakimp
– Tweakimp

2021年02月13日 22:17:40 +00:00
Commented Feb 13, 2021 at 22:17

Add a comment |

1 Answer 1

Sorted by: Reset to default

\$\begingroup\$

Is this what you want?

import pandas as pd
datasetA = pd.DataFrame(
 [
 [
 'ID222'
 ],
 [
 'ID233'
 ],
 [
 'ID2123'
 ],
 [
 'ID233'
 ]
 ], columns = ['ID']
)
datasetB = pd.DataFrame(
 [
 [
 'ID222'
 ],
 [
 'ID233'
 ],
 [
 'ID212355'
 ],
 [
 'ID233'
 ]
 ], columns = ['ID']
)
datasetA["inB"] = datasetA.ID.isin(datasetB.ID)
datasetA.drop_duplicates()
 ID inB
0 ID222 True
1 ID233 True
2 ID2123 False

answered Feb 13, 2021 at 23:05

Liam McIntyre's user avatar

Liam McIntyre Liam McIntyre

1362 bronze badges

\$\endgroup\$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

Stack Exchange Network

I have code that loops through a column of a dataframe. How can I make this code faster? (Python/Pandas)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

I have code that loops through a column of a dataframe. How can I make this code faster? (Python/Pandas)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions