9
\$\begingroup\$

I am dropping rows from a PANDAS dataframe when some of its columns have 0 value. I got the output by using the below code, but I hope we can do the same with less code — perhaps in a single line.

df:

 A B C
 0 1 2 5
 1 4 4 0
 2 6 8 4
 3 0 4 2

My code:

 drop_A=df.index[df["A"] == 0].tolist()
 drop_B=df.index[df["C"] == 0].tolist()
 c=drop_A+drop_B
 df=df.drop(df.index[c])

[out]

 A B C
 0 1 2 5
 2 6 8 4
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jan 18, 2018 at 11:19
\$\endgroup\$
2
  • \$\begingroup\$ Do you want to know a better way to do what your code is doing, or do you want us to code golf it? \$\endgroup\$ Commented Jan 18, 2018 at 11:27
  • \$\begingroup\$ I need a better way \$\endgroup\$ Commented Jan 18, 2018 at 11:27

2 Answers 2

13
\$\begingroup\$

I think you need create boolean DataFrame by compare all filtered columns values by scalar for not equality and then check all Trues per rows by all:

df = df[(df[['A','C']] != 0).all(axis=1)]
print (df)
 A B C
0 1 2 5
2 6 8 4

Details:

print (df[['A','C']] != 0)
 A C
0 True True
1 True False
2 True True
3 False True
print ((df[['A','C']] != 0).all(axis=1))
0 True
1 False
2 True
3 False
dtype: bool

I think you need create boolean DataFrame by compare all values by scalar and then check any Trues per rows by any and last invert mask by ~:

df = df[~(df[['A','C']] == 0).any(axis=1)]

Details:

print (df[['A','C']])
 A C
0 1 5
1 4 0
2 6 4
3 0 2
print (df[['A','C']] == 0)
 A C
0 False False
1 False True
2 False False
3 True False
print ((df[['A','C']] == 0).any(axis=1))
0 False
1 True
2 False
3 True
dtype: bool
print (~(df[['A','C']] == 0).any(axis=1))
0 True
1 False
2 True
3 False
dtype: bool
answered Jan 18, 2018 at 11:28
\$\endgroup\$
4
  • \$\begingroup\$ Jezrael , I want to consider only column A and C , pls check my question once \$\endgroup\$ Commented Jan 18, 2018 at 11:31
  • \$\begingroup\$ @pyd Clarify this in your question. \$\endgroup\$ Commented Jan 18, 2018 at 11:39
  • \$\begingroup\$ You have both "all not equal to 0" and "not any equal to zero". Did you intend these to be two options, or did you accidentally post two solutions? \$\endgroup\$ Commented Jan 18, 2018 at 17:51
  • \$\begingroup\$ @Accumulation No, it was no accident. I post first the best solution and second very nice, the best 2. :) \$\endgroup\$ Commented Jan 18, 2018 at 18:03
5
\$\begingroup\$

One line hack using .dropna()

import pandas as pd
df = pd.DataFrame({'A':[1,4,6,0],'B':[2,4,8,4],'C':[5,0,4,2]})
print df
 A B C
0 1 2 5
1 4 4 0
2 6 8 4
3 0 4 2
columns = ['A', 'C']
df = df.replace(0, pd.np.nan).dropna(axis=0, how='any', subset=columns).fillna(0).astype(int)
print df
 A B C
0 1 2 5
2 6 8 4

So, what's happening is:

  1. Replace 0 by NaN with .replace()
  2. Use .dropna() to drop NaN considering only columns A and C
  3. Replace NaN back to 0 with .fillna() (not needed if you use all columns instead of only a subset)
  4. Correct the data type from float to int with .astype()
answered Jan 23, 2018 at 9:08
\$\endgroup\$
3
  • \$\begingroup\$ Nice! I was hoping there was .dropna() hack to be had... good one paulo! \$\endgroup\$ Commented Apr 5, 2019 at 21:59
  • \$\begingroup\$ Just tried this and received the following: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead \$\endgroup\$ Commented Oct 6, 2020 at 19:15
  • \$\begingroup\$ Sure, just import numpy directly (import numpy as np) and replace pd.np.nan with np.nan instead. \$\endgroup\$ Commented Oct 6, 2020 at 19:35

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.