dask.dataframe.DataFrame.dropna

dask.dataframe.DataFrame.dropna#

DataFrame.dropna(how=<no_default>, subset=None, thresh=<no_default>)[source] #

Remove missing values.

This docstring was copied from pandas.core.frame.DataFrame.dropna.

Some inconsistencies with the Dask version may exist.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters:
axis{0 or ‘index’, 1 or ‘columns’}, default 0 (Not supported in Dask)

Determine if rows or columns which contain missing values are removed.

  • 0, or ‘index’ : Drop rows which contain missing values.

  • 1, or ‘columns’ : Drop columns which contain missing value.

Only a single axis is allowed.

how{‘any’, ‘all’}, default ‘any’

Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

  • ‘any’ : If any NA values are present, drop that row or column.

  • ‘all’ : If all values are NA, drop that row or column.

threshint, optional

Require that many non-NA values. Cannot be combined with how.

subsetcolumn label or sequence of labels, optional

Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplacebool, default False (Not supported in Dask)

Whether to modify the DataFrame rather than creating a new one.

ignore_indexbool, default False (Not supported in Dask)

If True, the resulting axis will be labeled 0, 1, ..., n - 1.

Added in version 2.0.0.

Returns:
DataFrame or None

DataFrame with NA entries dropped from it or None if inplace=True.

See also

DataFrame.isna

Indicate missing values.

DataFrame.notna

Indicate existing (non-missing) values.

DataFrame.fillna

Replace missing values.

Series.dropna

Drop missing values.

Index.dropna

Drop missing indices.

Examples

>>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...  "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...  "born": [pd.NaT, pd.Timestamp("1940年04月25日"),
...  pd.NaT]})
>>> df
 name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940年04月25日
2 Catwoman Bullwhip NaT

Drop the rows where at least one element is missing.

>>> df.dropna()
 name toy born
1 Batman Batmobile 1940年04月25日

Drop the columns where at least one element is missing.

>>> df.dropna(axis='columns')
 name
0 Alfred
1 Batman
2 Catwoman

Drop the rows where all elements are missing.

>>> df.dropna(how='all')
 name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940年04月25日
2 Catwoman Bullwhip NaT

Keep only the rows with at least 2 non-NA values.

>>> df.dropna(thresh=2)
 name toy born
1 Batman Batmobile 1940年04月25日
2 Catwoman Bullwhip NaT

Define in which columns to look for missing values.

>>> df.dropna(subset=['name', 'toy'])
 name toy born
1 Batman Batmobile 1940年04月25日
2 Catwoman Bullwhip NaT