Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Change default string storage from "python" to "pyarrow" (if installed) for for NA-variant of StringDtype #60287

Closed
Labels
API Design NA - MaskedArraysRelated to pd.NA and nullable extension arrays StringsString extension data type and string data
Milestone
@jorisvandenbossche

Description

Historically, the default value for the string storage (globally configurable through pd.options.mode.string_storage) of StringDtype was "python", and users needed to explicitly ask for "pyarrow". For example:

>>> ser = pd.Series(["a", "b"], dtype="string")
>>> ser.dtype
string[python]

and this is still the behaviour on main.

For the new NaN-variant of StringDtype, however, we implemented the default string storage option "auto" meaning "use pyarrow if installed, otherwise use python". So on a system with pyarrow installed:

>>> pd.options.future.infer_string = True
>>> ser = pd.Series(["a", "b"], dtype="str")
>>> ser.dtype.storage
'pyarrow'

Essentially we interpret the default string_storage option setting of "auto" differently for the NaN vs NA variant of the string dtype, which you can see in the code here:

if storage is None:
if na_value is not libmissing.NA:
storage = get_option("mode.string_storage")
if storage == "auto":
if HAS_PYARROW:
storage = "pyarrow"
else:
storage = "python"
else:
storage = get_option("mode.string_storage")
if storage == "auto":
storage = "python"

Proposal: I think it makes sense to also switch to "pyarrow" as the default string storage (if installed) for the nullable StringDtype. This is somewhat a breaking change (although mostly for the dtype object itself, because behaviour-wise for string operations, there should be hardly any difference between both backends), so I would keep this for 3.0 and properly document it in the whatsnew notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API Design NA - MaskedArraysRelated to pd.NA and nullable extension arrays StringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      AltStyle によって変換されたページ (->オリジナル) /