Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Unstable hashtable / duplicated algo for object dtype #27035

Open
Labels
Bug duplicatedduplicated, drop_duplicates
@jorisvandenbossche

Description

From a flaky test in geopandas, I observed the following behaviour:

In [1]: pd.__version__
Out[1]: '0.25.0.dev0+791.gf0919f272'
In [2]: from shapely.geometry import Point 
In [3]: a = np.array([Point(1, 1), Point(1, 1)], dtype=object) 
In [4]: pd.Series(a).duplicated()
Out[4]: 
0 False
1 True
dtype: bool
In [6]: print(pd.Series(a).duplicated()) 
 ...: print(pd.Series(a).duplicated())
0 False
1 True
dtype: bool
0 False
1 False
dtype: bool

So you see that sometimes it works, sometimes it does not work.

I am also not fully sure how the object hashtable works (assuming duplicated uses the hashtable), as the shapely Point objects are not hashable:

In [9]: pd.Series(a).unique()
...
TypeError: unhashable type: 'Point'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Bug duplicatedduplicated, drop_duplicates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /