3
\$\begingroup\$

I have two different series in pandas that I have created a nested for loop which checks if the values of the first series are in the other series. But this is time consuming in pandas and I cannot work out how to change it to a pandas method. I thought to use the apply function but it did not work with method chaining. My original nested for loops look like so and they work;

for x in df_one['ser_one']:
 print(x)
 for y in df_two['ser_two']:
 if 'MBTS' not in y and x in y:
 if 'L' in y:
 print(y)

Is there a way to make this less time consuming?

Here is what I attempted using apply methods;

df_two['ser_two'].apply(lambda x: x if 'MBTS' not in df_one['ser_one'].apply(lambda y:y) and x in df_one['ser_one'].apply(lambda y:y))

Example input:

df_one.head()
Out[136]: 
 type ser_one
0 MBTS VUMX1234
1 MBTS VUMX6436
2 MBTS VUMX5745
3 MBTS VUMX5802
4 MBTS VUMX8091
df_two.head()
Out[137]: 
 ser_two 
0 VUMX8091 
1 VUMX8091L 
2 VUMX1234 
3 VUMX1234L 
4 VUMX5838 
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Apr 10, 2019 at 10:52
\$\endgroup\$
2
  • \$\begingroup\$ Can you add some example input? \$\endgroup\$ Commented Apr 10, 2019 at 13:00
  • 1
    \$\begingroup\$ @Graipher have added. \$\endgroup\$ Commented Apr 10, 2019 at 13:32

1 Answer 1

1
\$\begingroup\$

Disclaimer, I am not the best at pandas, and I'm absolutely sure there is a far more readable way to accomplish this, but the following will rid you of your for loop and nested if statements, which are slower than vectorized numpy/pandas operations.

Your filter if 'MBTS' not in y won't work the way you think it will, at least, given the limited sample input, as y is a Series made from the column ser_one, not type. Let's assume that's an easy fix so in pseudocode it should be something like:

for x in df_one.ser_one:
 for y in df_two: # iterate through the rows so you get both columns
 if 'MBTS' not in y.type and x in y.ser_two:
 if 'L' not in y.ser_two:
 print(y)

This is a bit clunky, and pandas is great for vectorizing these sorts of operations, so let's filter it down to just Series operations. I'm working with a small part of your dataframes, so as a sanity check, they look like

df_one
 ser_one type
0 VUMX1234 MBTS
1 VUMX6436 MBTS
2 VUMX5745 MBTS
3 VUMX5802 MBTS
4 VUMX8091 MBTS
5 VUMX1234 XXXX
6 VUMX1234L XXXX
df_two
 ser_two
0 VUMX8091
1 VUMX8091L
2 VUMX1234
3 VUMX1234L
4 VUMX5838

I added a few entries that were non-MBTS to fit your problem.

The first bit, you want to find where 'MBTS' is not in df_one.type, but we want to filter the entire dataframe for that. df.loc will give you the rows that pass a given filter:

df_one.loc[df_one['type'] == 'MBTS']
 ser_one type
0 VUMX1234 MBTS
1 VUMX6436 MBTS
2 VUMX5745 MBTS
3 VUMX5802 MBTS
4 VUMX8091 MBTS
# or
df_one.loc[df_one['type'] != 'MBTS']
 ser_one type
5 VUMX1234 XXXX
6 VUMX1234L XXXX

Now you can check if the results of ser_one are contained within ser_two, since the output of that previous check is a Series, like so:

df_one.loc[df_one['type'] != 'MBTS']['ser_one'].isin(df_two['ser_two'])
5 True
6 True

Just get the .loc back from that, and you should be left with two records in this example:

df_one.loc[df_one.loc[df_one['type'] != 'MBTS']['ser_one'].isin(df_two['ser_two']).index]
 ser_one type
5 VUMX1234 XXXX
6 VUMX1234L XXXX

It might be a bit easier to do the filtering against any ser_one that contains 'L' ahead of time:

df_one[~df_one['ser_one'].str.contains("L")]
 ser_one type
0 VUMX1234 MBTS
1 VUMX6436 MBTS
2 VUMX5745 MBTS
3 VUMX5802 MBTS
4 VUMX8091 MBTS
5 VUMX1234 XXXX

Now, combining all of that into one big gigantic horrible expression

df_one.loc[df_one[~df_one['ser_one'].str.contains("L")].loc[df_one['type'] != 'MBTS']['ser_one'].isin(df_two['ser_two']).index]
 ser_one type
5 VUMX1234 XXXX

The outer loc will take an array of index values as returned by the .index call near the end of the expression. The rest is just chained filters which are operations in native pandas, implemented in C and fast.

answered Apr 10, 2019 at 19:31
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.