String dtype: backwards compatibility of selecting "object" vs "str" columns in `select_dtypes` #61916

New issue

Open

Labels

Strings

Milestone

3.0

@jorisvandenbossche

Description

@jorisvandenbossche

jorisvandenbossche

opened

on Jul 21, 2025

We provide the DataFrame.select_dtypes() method to easily subset columns based on data types (groups). See https://pandas.pydata.org/pandas-docs/version/2.3/user_guide/basics.html#selecting-columns-based-on-dtype

At the moment, as documented, the select string columns you must use the object dtype:

>>> pd.options.future.infer_string = False
>>> df = pd.DataFrame(
... {
... "string": list("abc"),
... "int64": list(range(1, 4)),
... }
... )
>>> df.dtypes
string object
int64 int64
dtype: object
>>> df.select_dtypes(include=[object])
 string
0 a
1 b
2 c

On current main, with the string dtype enabled, the above dataframe now has a str column, and so selecting object dtype columns gives an empty result. One can use str instead:

>>> pd.options.future.infer_string = True
>>> df = pd.DataFrame(
... {
... "string": list("abc"),
... "int64": list(range(1, 4)),
... }
... )
>>> df.dtypes
string str
int64 int64
dtype: object
>>> df.select_dtypes(include=[object])
Empty DataFrame
Columns: []
Index: [0, 1, 2]
>>> df.select_dtypes(include=[str])
 string
0 a
1 b
2 c

On the one hand, that is an "obvious" behaviour change as a consequence of the column now having a different dtype. But on the other hand, this will also break all code currently using select_dtypes to select string columns (and potentially silently, since it just no longer select them).

How to write compatible code?

One can select both object and string dtypes, so you select those columns in both older and newer pandas. One gotcha is that df.select_dtypes(include=[str]) is not allowed in pandas<=2.3 ("string dtypes are not allowed, use 'object' instead"), and has to use "string" instead of "str" (although the default dtype is str ..). This will select opt-in nullable string columns as well, but so also the new default str dtype:

# this gives the same result in both infer_string=True or False
>>> df.select_dtypes(include=[object, "string"])
 string
0 a
1 b
2 c

TODO: this should be added to the migration guide in https://pandas.pydata.org/docs/dev/user_guide/migration-3-strings.html#the-dtype-is-no-longer-object-dtype (update -> #62403)

Can we make this upgrade experience smoother?

Given that this will essentially break every use case of select_dtypes that involves selecting string columns (and given the fact this is a method, so we are more flexible compared to ser.dtype == object), I am wondering if we should provide some better upgrading behaviour. Some options:

For now let select_dtypes(include=[object]) keep selecting string columns as well, for backwards compatibility (and we can (later) add a warning we will stop doing that in the future)
When a user does select_dtypes(include=[object]) in pandas 3.0, and we see that there are str columns, raise a warning mentioning to the user they likely want to do include=[str] instead.

For both cases, it gets annoying if you actually want to select object columns, because then you have a (false positive) warning that you can't really do anything about (except ignoring/suppressing)

And in any case, we should probably still add a warning to pandas 2.3 about this when the string mode is enabled (for if we do a 2.3.2 release)

Metadata

Assignees

No one assigned

Labels

Strings

Type

No type

Projects

No projects

Milestone

3.0No due date

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

String dtype: backwards compatibility of selecting "object" vs "str" columns in `select_dtypes` #61916

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Uh oh!

String dtype: backwards compatibility of selecting "object" vs "str" columns in select_dtypes #61916

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

String dtype: backwards compatibility of selecting "object" vs "str" columns in `select_dtypes` #61916