BUG: df.duplicated treats None as np.nan in object columns · Issue #21720 · pandas-dev/pandas

BUG: df.duplicated treats None as np.nan in object columns #21720

New issue

Open

Labels

Bug Missing-data duplicated

@h-vetinari

Description

@h-vetinari

h-vetinari

opened

on Jul 3, 2018

Found out while writing tests for .duplicated in #21645 (so far, .duplicated was almost exclusively tested implicitly through .drop_duplicates)

At first I thought this is intended behaviour for DataFrame.duplicated(), but Series.duplicated() does not treat it equally. This makes sense to me, since as objects, None is not np.nan - I therefore labelled this as a bug.

s = pd.Series([np.nan, 3, 3, None, np.nan], dtype=object)
s
# 0 NaN
# 1 3
# 2 3
# 3 None
# 4 NaN
# dtype: object
s.duplicated()
# 0 False
# 1 False
# 2 True
# 3 False
# 4 True
# dtype: bool
s.to_frame().duplicated()
# 0 False
# 1 False
# 2 True
# 3 True <- CHANGED
# 4 True
# dtype: bool

Metadata

Assignees

No one assigned

Labels

Bug Missing-data duplicated

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: df.duplicated treats None as np.nan in object columns #21720

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions