Comparing columns in pandas different data frames and fill in a new column

Asked 6 years, 4 months ago

Viewed 159 times

\$\begingroup\$

I have two dataframes: One contains of company and its corresponding texts. The texts are in lists

**supplier_company_name Main_Text**
JDA SOFTWARE ['Supply chains','The answer is simple -RunJDA!']
PTC ['Hello', 'Solution']

The second dataframe is texts extracted from the company's website.

 Company Text 
0 JDA SOFTWARE About | JDA Software 
1 JDA SOFTWARE 833.JDA.4ROI
2 JDA SOFTWARE Contact Us
3 JDA SOFTWARE Customer Support 
4 PTC Training 
5 PTC Partner Advantage

I want to create the new column in second dataframe if the text extracted from the web matches with the any item inside the list in the Main_Text column of the first data frame, fill True else fill False.

Code:

target = []
for x in tqdm(range(len(df['supplier_company_name']))): #company name in df1
 #print(x)
 for y in range(len(samp['Company']): #company name in df2
 if samp['Company'][y] == df['supplier_company_name'][x]: #if the company name matches
 #check if the text matches
 if samp['Company'][y] in df['Main_Text'][x]:
 target.append(True)
 else:
 target.append(False)

How can I change my code to run efficiently?

edited May 3, 2019 at 13:41

AlexV's user avatar

AlexV

7,3532 gold badges24 silver badges47 bronze badges

asked May 3, 2019 at 10:51

DGS's user avatar

DGS DGS

1313 bronze badges

\$\endgroup\$

Add a comment |

1 Answer 1

Sorted by: Reset to default

\$\begingroup\$

I’ll take the hypothesis that your first dataframe (df) has unique company names. If so, you can easily reindex it by said company name and extract the (only one left) Main_Text Series to make it pretty much like a good old dict:

main_text = df.set_index('supplier_company_name')['Main_Text']

Now we just need to iterate over each line in samp, fetch the main text corresponding to the first column and generate a truthy value based on that and the second column. This is a job for apply:

target = samp.apply(lambda row: row[1] in main_text.loc[row[0]], axis=1)

answered May 3, 2019 at 12:49

301_Moved_Permanently's user avatar

301_Moved_Permanently 301_Moved_Permanently

29.4k3 gold badges48 silver badges98 bronze badges

\$\endgroup\$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

Stack Exchange Network

Comparing columns in pandas different data frames and fill in a new column

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Comparing columns in pandas different data frames and fill in a new column

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions