I am dropping rows from a PANDAS dataframe when some of its columns have 0 value. I got the output by using the below code, but I hope we can do the same with less code — perhaps in a single line.
df:
A B C
0 1 2 5
1 4 4 0
2 6 8 4
3 0 4 2
My code:
drop_A=df.index[df["A"] == 0].tolist()
drop_B=df.index[df["C"] == 0].tolist()
c=drop_A+drop_B
df=df.drop(df.index[c])
[out]
A B C
0 1 2 5
2 6 8 4
-
\$\begingroup\$ Do you want to know a better way to do what your code is doing, or do you want us to code golf it? \$\endgroup\$Peilonrayz– Peilonrayz ♦2018年01月18日 11:27:08 +00:00Commented Jan 18, 2018 at 11:27
-
\$\begingroup\$ I need a better way \$\endgroup\$pyd– pyd2018年01月18日 11:27:55 +00:00Commented Jan 18, 2018 at 11:27
2 Answers 2
I think you need create boolean DataFrame by compare all filtered columns values by scalar for not equality and then check all True
s per rows by all
:
df = df[(df[['A','C']] != 0).all(axis=1)]
print (df)
A B C
0 1 2 5
2 6 8 4
Details:
print (df[['A','C']] != 0)
A C
0 True True
1 True False
2 True True
3 False True
print ((df[['A','C']] != 0).all(axis=1))
0 True
1 False
2 True
3 False
dtype: bool
I think you need create boolean DataFrame by compare all values by scalar and then check any True
s per rows by any
and last invert mask by ~
:
df = df[~(df[['A','C']] == 0).any(axis=1)]
Details:
print (df[['A','C']])
A C
0 1 5
1 4 0
2 6 4
3 0 2
print (df[['A','C']] == 0)
A C
0 False False
1 False True
2 False False
3 True False
print ((df[['A','C']] == 0).any(axis=1))
0 False
1 True
2 False
3 True
dtype: bool
print (~(df[['A','C']] == 0).any(axis=1))
0 True
1 False
2 True
3 False
dtype: bool
-
\$\begingroup\$ Jezrael , I want to consider only column A and C , pls check my question once \$\endgroup\$pyd– pyd2018年01月18日 11:31:24 +00:00Commented Jan 18, 2018 at 11:31
-
\$\begingroup\$ @pyd Clarify this in your question. \$\endgroup\$2018年01月18日 11:39:01 +00:00Commented Jan 18, 2018 at 11:39
-
\$\begingroup\$ You have both "all not equal to 0" and "not any equal to zero". Did you intend these to be two options, or did you accidentally post two solutions? \$\endgroup\$Acccumulation– Acccumulation2018年01月18日 17:51:29 +00:00Commented Jan 18, 2018 at 17:51
-
\$\begingroup\$ @Accumulation No, it was no accident. I post first the best solution and second very nice, the best 2. :) \$\endgroup\$jezrael– jezrael2018年01月18日 18:03:01 +00:00Commented Jan 18, 2018 at 18:03
One line hack using .dropna()
import pandas as pd
df = pd.DataFrame({'A':[1,4,6,0],'B':[2,4,8,4],'C':[5,0,4,2]})
print df
A B C
0 1 2 5
1 4 4 0
2 6 8 4
3 0 4 2
columns = ['A', 'C']
df = df.replace(0, pd.np.nan).dropna(axis=0, how='any', subset=columns).fillna(0).astype(int)
print df
A B C
0 1 2 5
2 6 8 4
So, what's happening is:
- Replace
0
byNaN
with.replace()
- Use
.dropna()
to dropNaN
considering only columnsA
andC
- Replace
NaN
back to0
with.fillna()
(not needed if you use all columns instead of only a subset) - Correct the data type from
float
toint
with.astype()
-
\$\begingroup\$ Nice! I was hoping there was .dropna() hack to be had... good one paulo! \$\endgroup\$killian95– killian952019年04月05日 21:59:59 +00:00Commented Apr 5, 2019 at 21:59
-
\$\begingroup\$ Just tried this and received the following: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead \$\endgroup\$JMVDA– JMVDA2020年10月06日 19:15:44 +00:00Commented Oct 6, 2020 at 19:15
-
\$\begingroup\$ Sure, just import numpy directly (
import numpy as np
) and replacepd.np.nan
withnp.nan
instead. \$\endgroup\$paulo.filip3– paulo.filip32020年10月06日 19:35:21 +00:00Commented Oct 6, 2020 at 19:35