3
\$\begingroup\$

I have the following code which takes 10-15 minutes to execute. That is way too slow considering that my database is growing daily. Is there any chance to make it faster?

# Replace all empty lists ([], '[]') in dataframe with NaN's
df = df.mask(df.applymap(str).eq('[]'))
# Replace all zeros in dataframe with NaN's
df[df == 0.0] = np.nan
# Replace empty strings in dataframe with NaN's
df.replace('', np.nan, inplace=True)
# Replace all strings with value 'null' in a dataframe with NaN
df.replace('null', np.NaN, inplace=True)
asked May 8, 2017 at 13:09
\$\endgroup\$
2
  • 1
    \$\begingroup\$ This seems like it could belong on Stack Overflow as it is a specific problem. \$\endgroup\$ Commented May 8, 2017 at 13:34
  • \$\begingroup\$ What does your data actually look like? Is it a single column of unknown length lists? \$\endgroup\$ Commented May 8, 2017 at 15:19

2 Answers 2

2
\$\begingroup\$

There is the built in function where to help with changing multiple values based on some condition:

mask = df.applymap(str).isin(["[]", 0.0, "", "null"])
df = df.where(~mask, other=np.nan)

This is about only 25% quicker than the original code

answered May 8, 2017 at 13:40
\$\endgroup\$
2
  • \$\begingroup\$ Thanks, makes sense but I am getting an error: TypeError: unhashable type: 'list'. Any ideas? \$\endgroup\$ Commented May 8, 2017 at 13:51
  • \$\begingroup\$ Oh yes, thats a bit annoying, you may have to check the string version as you did in your original code. \$\endgroup\$ Commented May 8, 2017 at 14:03
4
\$\begingroup\$

Not a lot to review there;

The code is well documented and readable, the only thing I frowned at was df[df == 0.0], but my Python is probably just not good enough.

In can only think of the fact that replace can take a list of strings for to_replace, so you can merge the last 2 statements, which will give a speed up.

From a design perspective, all the routines writing to your database should never write empty lists, zeros, empty strings or nulls, but NaN instead. Then you would never have to run this script in the first place.

Glorfindel
1,1133 gold badges14 silver badges27 bronze badges
answered May 8, 2017 at 13:29
\$\endgroup\$
1
  • \$\begingroup\$ @user137913 Are you maintaining the database? \$\endgroup\$ Commented May 8, 2017 at 14:07

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.