2
\$\begingroup\$

I have two dataframes: One contains of name and year.

**name** **year**
ram 1873
rob 1900

Second contains names and texts.

**name** **text**
ram A good kid
ram He was born on 1873
rob He is tall
rob He is 12 yrs old
rob His father died at 1900

I want to find the indices of the rows of second dataframe where the name of second dataframe matches with name of the first df and the text in second df contains the year in first df.

The result should be indices 1,4

My Code:

ind_list = []
for ind1, old in enumerate(A.name):
 for ind2, new in enumerate(B.name):
 if A.name[ind1] == B.name[ind2]:
 if A.year[ind1] in B.text[ind2]:
 ind_list.append(ind2)

Any better way to write the above code?

dfhwze
14.1k3 gold badges40 silver badges101 bronze badges
asked Aug 6, 2019 at 7:15
\$\endgroup\$
1
  • 1
    \$\begingroup\$ I have added the python tag, this one should always be provided as companion of a python-* tag. \$\endgroup\$ Commented Aug 6, 2019 at 10:20

1 Answer 1

2
\$\begingroup\$

Here is what we start with.

In [16]: df1
Out[16]:
 name year
0 ram 1873
1 rob 1900
In [17]: df2
Out[17]:
 name text
0 ram A good kid
1 ram He was born on 1873
2 rob He is tall
3 rob He is 12 yrs old
4 rob His father died at 1900

What you probably want to do is merge your two DataFrames. If you're familiar with SQL, this is just like a table join. The pd.merge step essentially "adds" the columns from df1 to df2 by checking where the two DataFrames match on the column "name". Then, once you have the columns you want ("year" and "text") matching according to the "name" column, we apply the function lambda x: str(x.year) in x.text (which checks if the year is present in the text) across the rows (axis=1).

In [18]: cond = pd.merge(
 ...: left=df2,
 ...: right=df1,
 ...: how="left",
 ...: left_on="name",
 ...: right_on="name",
 ...: ).apply(lambda x: str(x.year) in x.text, axis=1)

This gives us a Series which has the same index as your second DataFrame, and contains boolean values telling you if your desired condition is met or not.

In [19]: cond
Out[19]:
0 False
1 True
2 False
3 False
4 True
dtype: bool

Then, we filter your Series to where the condition is true, and give the index, optionally converting it to a list.

In [20]: cond[cond].index
Out[20]: Int64Index([1, 4], dtype='int64')
In [21]: cond[cond].index.tolist()
Out[21]: [1, 4]

If all you need later on is to iterate over the indices you've gotten, In [18] and In [20] will suffice.

answered Aug 7, 2019 at 2:02
\$\endgroup\$
2
  • \$\begingroup\$ Thanks.. this is good. But if I apply it for different data-frames the 1/3 rd of the total number of rows from cond dataframe is NaN. What could be the possible reason \$\endgroup\$ Commented Aug 12, 2019 at 14:24
  • \$\begingroup\$ I can't know for sure, but if, for example, df2 contains names not present in df1, then those rows will probably get filled in with NaN during the join/merge. Depending of your use case, you might treat those cases differently, but if I interpret your question strictly, then replacing str(x.year) in x.text with (str(int(x.year)) in x.text) if not pd.isnull(x.year) else False would probably be the best way to go (converting the NaN to False so they don't appear in the final list of indices). \$\endgroup\$ Commented Aug 13, 2019 at 0:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.