Searching for keywords in dataframe

Asked 7 years, 11 months ago

Viewed 13k times

\$\begingroup\$

Given a dataframe with three columns of text blobs to search through, which can be found in this Gist.

And three keywords that I want to identify in this text:

branches_of_sci = ['bio', 'chem', 'physics']

I've written the following code to identify the presence of these keywords:

dfq_col = ['Text A', 'Text B', 'Text C']
for branch in branches_of_sci:
 for col in dfq_col:
 temp_list = []
 for row in df[col]:
 if type(row) is not str:
 temp_list.append(False)
 elif type(row) is str:
 temp_list.append(row.find(branch)>0)
 df[branch] |= temp_list

This is the result of the data I linked to:

table result

I think the main problem here is that I'm using a for-loop when I should be using some sort of dataframe-specific function, but I'm not sure how to restructure the code to accomplish this.

edited Oct 29, 2017 at 21:32

Jamal's user avatar

Jamal

35.2k13 gold badges134 silver badges238 bronze badges

asked Oct 28, 2017 at 18:40

Seanny123's user avatar

Seanny123 Seanny123

1,6173 gold badges19 silver badges37 bronze badges

\$\endgroup\$

Add a comment |

1 Answer 1

Sorted by: Reset to default

\$\begingroup\$

import pandas as pd
df = pd.read_clipboard(sep=',') # copied data from the gist
branches_of_sci = ['bio', 'chem', 'physics']
for branch in branches_of_sci:
 df[branch] = df.astype(str).sum(axis=1).str.contains(branch)

In my limited experience, for loops are almost always wrong when using Pandas. The primary benefit of Pandas is vectorization, so using the built-in methods is typically best.

Here is a breakdown of the main function:

df[branch] creates a new dataframe column
df.astype(str) converts all of the dtypes in the dataframe to strings
.sum(axis=1) concatenates all dataframe columns horizontally (i.e. axis=1)
.str.contains() use built-in string search (see docs)

Hopefully that helps.

edited Oct 29, 2017 at 18:40

Seanny123's user avatar

Seanny123

1,6173 gold badges19 silver badges37 bronze badges

answered Oct 29, 2017 at 9:46

Alex Cook's user avatar

Alex Cook Alex Cook

461 bronze badge

\$\endgroup\$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

Stack Exchange Network

Searching for keywords in dataframe

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Searching for keywords in dataframe

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions