1
\$\begingroup\$

Given a dataframe with three columns of text blobs to search through, which can be found in this Gist.

And three keywords that I want to identify in this text:

branches_of_sci = ['bio', 'chem', 'physics']

I've written the following code to identify the presence of these keywords:

dfq_col = ['Text A', 'Text B', 'Text C']
for branch in branches_of_sci:
 for col in dfq_col:
 temp_list = []
 for row in df[col]:
 if type(row) is not str:
 temp_list.append(False)
 elif type(row) is str:
 temp_list.append(row.find(branch)>0)
 df[branch] |= temp_list

This is the result of the data I linked to:

table result

I think the main problem here is that I'm using a for-loop when I should be using some sort of dataframe-specific function, but I'm not sure how to restructure the code to accomplish this.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Oct 28, 2017 at 18:40
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$
import pandas as pd
df = pd.read_clipboard(sep=',') # copied data from the gist
branches_of_sci = ['bio', 'chem', 'physics']
for branch in branches_of_sci:
 df[branch] = df.astype(str).sum(axis=1).str.contains(branch)

In my limited experience, for loops are almost always wrong when using Pandas. The primary benefit of Pandas is vectorization, so using the built-in methods is typically best.

Here is a breakdown of the main function:

  1. df[branch] creates a new dataframe column
  2. df.astype(str) converts all of the dtypes in the dataframe to strings
  3. .sum(axis=1) concatenates all dataframe columns horizontally (i.e. axis=1)
  4. .str.contains() use built-in string search (see docs)

Hopefully that helps.

Seanny123
1,6173 gold badges19 silver badges37 bronze badges
answered Oct 29, 2017 at 9:46
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.