What’s new in 3.0.0 (Month XX, 2025)#

These are the changes in pandas 3.0.0. See Release notes for a full changelog including other versions of pandas.

Enhancements#

Dedicated string data type by default#

Historically, pandas represented string columns with NumPy object data type. This representation has numerous problems: it is not specific to strings (any Python object can be stored in an object-dtype array, not just strings) and it is often not very efficient (both performance wise and for memory usage).

Starting with pandas 3.0, a dedicated string data type is enabled by default (backed by PyArrow under the hood, if installed, otherwise falling back to being backed by NumPy object-dtype). This means that pandas will start inferring columns containing string data as the new str data type when creating pandas objects, such as in constructors or IO functions.

Old behavior:

>>> ser = pd.Series(["a", "b"])
0 a
1 b
dtype: object

New behavior:

>>> ser = pd.Series(["a", "b"])
0 a
1 b
dtype: str

The string data type that is used in these scenarios will mostly behave as NumPy object would, including missing value semantics and general operations on these columns.

The main characteristic of the new string data type:

Inferred by default for string data (instead of object dtype)
The str dtype can only hold strings (or missing values), in contrast to object dtype. (setitem with non string fails)
The missing value sentinel is always NaN (np.nan) and follows the same missing value semantics as the other default dtypes.

Those intentional changes can have breaking consequences, for example when checking for the .dtype being object dtype or checking the exact missing value sentinel. See the Migration guide for the new string data type (pandas 3.0) for more details on the behaviour changes and how to adapt your code to the new default.

Copy-on-Write#

The new "copy-on-write" behaviour in pandas 3.0 brings changes in behavior in how pandas operates with respect to copies and views. A summary of the changes:

The result of any indexing operation (subsetting a DataFrame or Series in any way, i.e. including accessing a DataFrame column as a Series) or any method returning a new DataFrame or Series, always behaves as if it were a copy in terms of user API.
As a consequence, if you want to modify an object (DataFrame or Series), the only way to do this is to directly modify that object itself.

The main goal of this change is to make the user API more consistent and predictable. There is now a clear rule: any subset or returned series/dataframe always behaves as a copy of the original, and thus never modifies the original (before pandas 3.0, whether a derived object would be a copy or a view depended on the exact operation performed, which was often confusing).

Because every single indexing step now behaves as a copy, this also means that "chained assignment" (updating a DataFrame with multiple setitem steps) will stop working. Because this now consistently never works, the SettingWithCopyWarning is removed.

The new behavioral semantics are explained in more detail in the user guide about Copy-on-Write.

A secondary goal is to improve performance by avoiding unnecessary copies. As mentioned above, every new DataFrame or Series returned from an indexing operation or method behaves as a copy, but under the hood pandas will use views as much as possible, and only copy when needed to guarantee the "behaves as a copy" behaviour (this is the actual "copy-on-write" mechanism used as an implementation detail).

Some of the behaviour changes described above are breaking changes in pandas 3.0. When upgrading to pandas 3.0, it is recommended to first upgrade to pandas 2.3 to get deprecation warnings for a subset of those changes. The migration guide explains the upgrade process in more detail.

`pd.col` syntax can now be used in `DataFrame.assign()` and `DataFrame.loc()` #

You can now use pd.col to create callables for use in dataframe methods which accept them. For example, if you have a dataframe

In [1]: df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})

and you want to create a new column 'c' by summing 'a' and 'b', then instead of

In [2]: df.assign(c = lambda df: df['a'] + df['b'])
Out[2]: 
 a b c
0 1 4 5
1 1 5 6
2 2 6 8

you can now write:

In [3]: df.assign(c = pd.col('a') + pd.col('b'))
Out[3]: 
 a b c
0 1 4 5
1 1 5 6
2 2 6 8

New Deprecation Policy#

pandas 3.0.0 introduces a new 3-stage deprecation policy: using DeprecationWarning initially, then switching to FutureWarning for broader visibility in the last minor version before the next major release, and then removal of the deprecated functionality in the major release. This was done to give downstream packages more time to adjust to pandas deprecations, which should reduce the amount of warnings that a user gets from code that isn’t theirs. See PDEP 17 for more details.

All warnings for upcoming changes in pandas will have the base class pandas.errors.PandasChangeWarning. Users may also use the following subclasses to control warnings.

pandas.errors.Pandas4Warning: Warnings which will be enforced in pandas 4.0.
pandas.errors.Pandas5Warning: Warnings which will be enforced in pandas 5.0.
pandas.errors.PandasPendingDeprecationWarning: Base class of all warnings which emit a PendingDeprecationWarning, independent of the version they will be enforced.
pandas.errors.PandasDeprecationWarning: Base class of all warnings which emit a DeprecationWarning, independent of the version they will be enforced.
pandas.errors.PandasFutureWarning: Base class of all warnings which emit a FutureWarning, independent of the version they will be enforced.

Other enhancements#

pandas.merge() propagates the attrs attribute to the result if all inputs have identical attrs, as has so far already been the case for pandas.concat().
pandas.api.typing.FrozenList is available for typing the outputs of MultiIndex.names, MultiIndex.codes and MultiIndex.levels (GH 58237)
pandas.api.typing.SASReader is available for typing the output of read_sas() (GH 55689)
Added Styler.to_typst() to write Styler objects to file, buffer or string in Typst format (GH 57617)
Added missing pandas.Series.info() to API reference (GH 60926)
pandas.api.typing.NoDefault is available for typing no_default
DataFrame.to_excel() now raises an UserWarning when the character count in a cell exceeds Excel’s limitation of 32767 characters (GH 56954)
pandas.merge() now validates the how parameter input (merge type) (GH 59435)
pandas.merge(), DataFrame.merge() and DataFrame.join() now support anti joins (left_anti and right_anti) in the how parameter (GH 42916)
read_spss() now supports kwargs to be passed to pyreadstat (GH 56356)
read_stata() now returns datetime64 resolutions better matching those natively stored in the stata format (GH 55642)
DataFrame.agg() called with axis=1 and a func which relabels the result index now raises a NotImplementedError (GH 58807).
Index.get_loc() now accepts also subclasses of tuple as keys (GH 57922)
Styler.set_tooltips() provides alternative method to storing tooltips by using title attribute of td elements. (GH 56981)
Added missing parameter weights in DataFrame.plot.kde() for the estimation of the PDF (GH 59337)
Allow dictionaries to be passed to pandas.Series.str.replace() via pat parameter (GH 51748)
Support passing a Series input to json_normalize() that retains the Series Index (GH 51452)
Support reading value labels from Stata 108-format (Stata 6) and earlier files (GH 58154)
Users can globally disable any PerformanceWarning by setting the option mode.performance_warnings to False (GH 56920)
Styler.format_index_names() can now be used to format the index and column names (GH 48936 and GH 47489)
errors.DtypeWarning improved to include column names when mixed data types are detected (GH 58174)
Rolling and Expanding now support pipe method (GH 57076)
Series now supports the Arrow PyCapsule Interface for export (GH 59518)
DataFrame.to_excel() argument merge_cells now accepts a value of "columns" to only merge MultiIndex column header header cells (GH 35384)
set_option() now accepts a dictionary of options, simplifying configuration of multiple settings at once (GH 61093)
DataFrame.corrwith() now accepts min_periods as optional arguments, as in DataFrame.corr() and Series.corr() (GH 9490)
DataFrame.cummin(), DataFrame.cummax(), DataFrame.cumprod() and DataFrame.cumsum() methods now have a numeric_only parameter (GH 53072)
DataFrame.ewm() now allows adjust=False when times is provided (GH 54328)
DataFrame.fillna() and Series.fillna() can now accept value=None; for non-object dtype the corresponding NA value will be used (GH 57723)
DataFrame.pivot_table() and pivot_table() now allow the passing of keyword arguments to aggfunc through **kwargs (GH 57884)
DataFrame.to_json() now encodes Decimal as strings instead of floats (GH 60698)
Series.cummin() and Series.cummax() now supports CategoricalDtype (GH 52335)
Series.plot() now correctly handle the ylabel parameter for pie charts, allowing for explicit control over the y-axis label (GH 58239)
DataFrame.plot.scatter() argument c now accepts a column of strings, where rows with the same string are colored identically (GH 16827 and GH 16485)
Series.nlargest() uses a ‘stable’ sort internally and will preserve original ordering.
ArrowDtype now supports pyarrow.JsonType (GH 60958)
DataFrameGroupBy and SeriesGroupBy methods sum, mean, median, prod, min, max, std, var and sem now accept skipna parameter (GH 15675)
Easter has gained a new constructor argument method which specifies the method used to calculate Easter — for example, Orthodox Easter (GH 61665)
Holiday constructor argument days_of_week will raise a ValueError when type is something other than None or tuple (GH 61658)
Holiday has gained the constructor argument and field exclude_dates to exclude specific datetimes from a custom holiday calendar (GH 54382)
Rolling and Expanding now support nunique (GH 26958)
Rolling and Expanding now support aggregations first and last (GH 33155)
read_parquet() accepts to_pandas_kwargs which are forwarded to pyarrow.Table.to_pandas() which enables passing additional keywords to customize the conversion to pandas, such as maps_as_pydicts to read the Parquet map data type as python dictionaries (GH 56842)
DataFrameGroupBy.transform(), SeriesGroupBy.transform(), DataFrameGroupBy.agg(), SeriesGroupBy.agg(), SeriesGroupBy.apply(), DataFrameGroupBy.apply() now support kurt (GH 40139)
DataFrame.apply() supports using third-party execution engines like the Bodo.ai JIT compiler (GH 60668)
DataFrame.iloc() and Series.iloc() now support boolean masks in __getitem__ for more consistent indexing behavior (GH 60994)
DataFrame.to_csv() and Series.to_csv() now support Python’s new-style format strings (e.g., "{:.6f}") for the float_format parameter, in addition to old-style % format strings and callables. This allows for more flexible and modern formatting of floating point numbers when exporting to CSV. (GH 49580)
DataFrameGroupBy.transform(), SeriesGroupBy.transform(), DataFrameGroupBy.agg(), SeriesGroupBy.agg(), RollingGroupby.apply(), ExpandingGroupby.apply(), Rolling.apply(), Expanding.apply(), DataFrame.apply() with engine="numba" now supports positional arguments passed as kwargs (GH 58995)
Rolling.agg(), Expanding.agg() and ExponentialMovingWindow.agg() now accept NamedAgg aggregations through **kwargs (GH 28333)
Series.map() can now accept kwargs to pass on to func (GH 59814)
Series.map() now accepts an engine parameter to allow execution with a third-party execution engine (GH 61125)
Series.rank() and DataFrame.rank() with numpy-nullable dtypes preserve NA values and return UInt64 dtype where appropriate instead of casting NA to NaN with float64 dtype (GH 62043)
Series.str.get_dummies() now accepts a dtype parameter to specify the dtype of the resulting DataFrame (GH 47872)
pandas.concat() will raise a ValueError when ignore_index=True and keys is not None (GH 59274)
frozenset elements in pandas objects are now natively printed (GH 60690)
Add "delete_rows" option to if_exists argument in DataFrame.to_sql() deleting all records of the table before inserting data (GH 37210).
Added half-year offset classes HalfYearBegin, HalfYearEnd, BHalfYearBegin and BHalfYearEnd (GH 60928)
Added support to read and write from and to Apache Iceberg tables with the new read_iceberg() and DataFrame.to_iceberg() functions (GH 61383)
Errors occurring during SQL I/O will now throw a generic DatabaseError instead of the raw Exception type from the underlying driver manager library (GH 60748)
Implemented Series.str.isascii() and Series.str.isascii() (GH 59091)
Improve the resulting dtypes in DataFrame.where() and DataFrame.mask() with ExtensionDtype other (GH 62038)
Improved deprecation message for offset aliases (GH 60820)
Many type aliases are now exposed in the new submodule pandas.api.typing.aliases (GH 55231)
Multiplying two DateOffset objects will now raise a TypeError instead of a RecursionError (GH 59442)
Restore support for reading Stata 104-format and enable reading 103-format dta files (GH 58554)
Support passing a Iterable[Hashable] input to DataFrame.drop_duplicates() (GH 59237)
Support reading Stata 102-format (Stata 1) dta files (GH 58978)
Support reading Stata 110-format (Stata 7) dta files (GH 47176)
Switched wheel upload to PyPI Trusted Publishing (OIDC) for release-tag pushes in wheels.yml. (GH 61718)

Notable bug fixes#

These are bug fixes that might have notable behavior changes.

Improved behavior in groupby for `observed=False`#

A number of bugs have been fixed due to improved handling of unobserved groups (GH 55738). All remarks in this section equally impact SeriesGroupBy.

In previous versions of pandas, a single grouping with DataFrameGroupBy.apply() or DataFrameGroupBy.agg() would pass the unobserved groups to the provided function, resulting in 0 below.

In [4]: df = pd.DataFrame(
 ...:  {
 ...:  "key1": pd.Categorical(list("aabb"), categories=list("abc")),
 ...:  "key2": [1, 1, 1, 2],
 ...:  "values": [1, 2, 3, 4],
 ...:  }
 ...: )
 ...: 
In [5]: df
Out[5]: 
 key1 key2 values
0 a 1 1
1 a 1 2
2 b 1 3
3 b 2 4
In [6]: gb = df.groupby("key1", observed=False)
In [7]: gb[["values"]].apply(lambda x: x.sum())
Out[7]: 
 values
key1 
a 3
b 7
c 0

However this was not the case when using multiple groupings, resulting in NaN below.

In [1]: gb = df.groupby(["key1", "key2"], observed=False)
In [2]: gb[["values"]].apply(lambda x: x.sum())
Out[2]:
 values
key1 key2
a 1 3.0
 2 NaN
b 1 3.0
 2 4.0
c 1 NaN
 2 NaN

Now using multiple groupings will also pass the unobserved groups to the provided function.

In [8]: gb = df.groupby(["key1", "key2"], observed=False)
In [9]: gb[["values"]].apply(lambda x: x.sum())
Out[9]: 
 values
key1 key2 
a 1 3
 2 0
b 1 3
 2 4
c 1 0
 2 0

Similarly:

In previous versions of pandas the method DataFrameGroupBy.sum() would result in 0 for unobserved groups, but DataFrameGroupBy.prod(), DataFrameGroupBy.all(), and DataFrameGroupBy.any() would all result in NA values. Now these methods result in 1, True, and False respectively.
DataFrameGroupBy.groups() did not include unobserved groups and now does.

These improvements also fixed certain bugs in groupby:

DataFrameGroupBy.agg() would fail when there are multiple groupings, unobserved groups, and as_index=False (GH 36698)
DataFrameGroupBy.groups() with sort=False would sort groups; they now occur in the order they are observed (GH 56966)
DataFrameGroupBy.nunique() would fail when there are multiple groupings, unobserved groups, and as_index=False (GH 52848)
DataFrameGroupBy.sum() would have incorrect values when there are multiple groupings, unobserved groups, and non-numeric data (GH 43891)
DataFrameGroupBy.value_counts() would produce incorrect results when used with some categorical and some non-categorical groupings and observed=False (GH 56016)

notable_bug_fix2#

Backwards incompatible API changes#

Datetime resolution inference#

Converting a sequence of strings, datetime objects, or np.datetime64 objects to a datetime64 dtype now performs inference on the appropriate resolution (AKA unit) for the output dtype. This affects Series, DataFrame, Index, DatetimeIndex, and to_datetime().

Previously, these would always give nanosecond resolution:

In [1]: dt = pd.Timestamp("2024年03月22日 11:36").to_pydatetime()
In [2]: pd.to_datetime([dt]).dtype
Out[2]: dtype('<M8[ns]')
In [3]: pd.Index([dt]).dtype
Out[3]: dtype('<M8[ns]')
In [4]: pd.DatetimeIndex([dt]).dtype
Out[4]: dtype('<M8[ns]')
In [5]: pd.Series([dt]).dtype
Out[5]: dtype('<M8[ns]')

This now infers the unit microsecond unit "us" from the pydatetime object, matching the scalar Timestamp behavior.

In [10]: In [1]: dt = pd.Timestamp("2024年03月22日 11:36").to_pydatetime()
In [11]: In [2]: pd.to_datetime([dt]).dtype
Out[11]: dtype('<M8[us]')
In [12]: In [3]: pd.Index([dt]).dtype
Out[12]: dtype('<M8[us]')
In [13]: In [4]: pd.DatetimeIndex([dt]).dtype
Out[13]: dtype('<M8[us]')
In [14]: In [5]: pd.Series([dt]).dtype
Out[14]: dtype('<M8[us]')

Similar when passed a sequence of np.datetime64 objects, the resolution of the passed objects will be retained (or for lower-than-second resolution, second resolution will be used).

When passing strings, the resolution will depend on the precision of the string, again matching the Timestamp behavior. Previously:

In [2]: pd.to_datetime(["2024年03月22日 11:43:01"]).dtype
Out[2]: dtype('<M8[ns]')
In [3]: pd.to_datetime(["2024年03月22日 11:43:01.002"]).dtype
Out[3]: dtype('<M8[ns]')
In [4]: pd.to_datetime(["2024年03月22日 11:43:01.002003"]).dtype
Out[4]: dtype('<M8[ns]')
In [5]: pd.to_datetime(["2024年03月22日 11:43:01.002003004"]).dtype
Out[5]: dtype('<M8[ns]')

The inferred resolution now matches that of the input strings:

In [15]: In [2]: pd.to_datetime(["2024年03月22日 11:43:01"]).dtype
Out[15]: dtype('<M8[s]')
In [16]: In [3]: pd.to_datetime(["2024年03月22日 11:43:01.002"]).dtype
Out[16]: dtype('<M8[ms]')
In [17]: In [4]: pd.to_datetime(["2024年03月22日 11:43:01.002003"]).dtype
Out[17]: dtype('<M8[us]')
In [18]: In [5]: pd.to_datetime(["2024年03月22日 11:43:01.002003004"]).dtype
Out[18]: dtype('<M8[ns]')

In cases with mixed-resolution inputs, the highest resolution is used:

In [2]: pd.to_datetime([pd.Timestamp("2024年03月22日 11:43:01"), "2024年03月22日 11:43:01.002"]).dtype
Out[2]: dtype('<M8[ns]')

Changed behavior in `DataFrame.value_counts()` and `DataFrameGroupBy.value_counts()` when `sort=False`#

In previous versions of pandas, DataFrame.value_counts() with sort=False would sort the result by row labels (as was documented). This was nonintuitive and inconsistent with Series.value_counts() which would maintain the order of the input. Now DataFrame.value_counts() will maintain the order of the input.

In [19]: df = pd.DataFrame(
 ....:  {
 ....:  "a": [2, 2, 2, 2, 1, 1, 1, 1],
 ....:  "b": [2, 1, 3, 1, 2, 3, 1, 1],
 ....:  }
 ....: )
 ....: 
In [20]: df
Out[20]: 
 a b
0 2 2
1 2 1
2 2 3
3 2 1
4 1 2
5 1 3
6 1 1
7 1 1

Old behavior

In [3]: df.value_counts(sort=False)
Out[3]:
a b
1 1 2
 2 1
 3 1
2 1 2
 2 1
 3 1
Name: count, dtype: int64

New behavior

In [21]: df.value_counts(sort=False)
Out[21]: 
a b
2 2 1
 1 2
 3 1
1 2 1
 3 1
 1 2
Name: count, dtype: int64

This change also applies to DataFrameGroupBy.value_counts(). Here, there are two options for sorting: one sort passed to DataFrame.groupby() and one passed directly to DataFrameGroupBy.value_counts(). The former will determine whether to sort the groups, the latter whether to sort the counts. All non-grouping columns will maintain the order of the input within groups.

Old behavior

In [5]: df.groupby("a", sort=True).value_counts(sort=False)
Out[5]:
a b
1 1 2
 2 1
 3 1
2 1 2
 2 1
 3 1
dtype: int64

New behavior

In [22]: df.groupby("a", sort=True).value_counts(sort=False)
Out[22]: 
a b
1 2 1
 3 1
 1 2
2 2 1
 3 1
 1 2
Name: count, dtype: int64

Changed behavior of `pd.offsets.Day` to always represent calendar-day#

In previous versions of pandas, offsets.Day represented a fixed span of 24 hours, disregarding Daylight Savings Time transitions. It now consistently behaves as a calendar-day, preserving time-of-day across DST transitions:

Old behavior

In [5]: ts = pd.Timestamp("2025年03月08日 08:00", tz="US/Eastern")
In [6]: ts + pd.offsets.Day(1)
Out[3]: Timestamp('2025年03月09日 09:00:00-0400', tz='US/Eastern')

New behavior

In [23]: ts = pd.Timestamp("2025年03月08日 08:00", tz="US/Eastern")
In [24]: ts + pd.offsets.Day(1)
Out[24]: Timestamp('2025年03月09日 08:00:00-0400', tz='US/Eastern')

This change fixes a long-standing bug in date_range() (GH 51716, GH 35388), but causes several small behavior differences as collateral:

pd.offsets.Day(n) no longer compares as equal to pd.offsets.Hour(24*n)
offsets.Day no longer supports division
Timedelta no longer accepts Day objects as inputs
tseries.frequencies.to_offset() on a Timedelta object returns a offsets.Hour object in cases where it used to return a Day object.
Adding or subtracting a scalar from a timezone-aware DatetimeIndex with a Day freq no longer preserves that freq attribute.
Adding or subtracting a Day with a Timedelta is no longer supported.
Adding or subtracting a Day offset to a timezone-aware Timestamp or datetime-like may lead to an ambiguous or non-existent time, which will raise.

Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes#

Previously, when dealing with a nullable dtype (e.g. Float64Dtype or int64[pyarrow]), NaN was treated as interchangeable with NA in some circumstances but not others. This was done to make adoption easier, but caused some confusion (GH 32265). In 3.0, an option "mode.nan_is_na" (default True) controls whether to treat NaN as equivalent to NA.

With pd.set_option("mode.nan_is_na", True) (again, this is the default), NaN can be passed to constructors, __setitem__, __contains__ and be treated the same as NA. The only change users will see is that arithmetic and np.ufunc operations that previously introduced NaN entries produce NA entries instead:

Old behavior:

In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
In [3]: ser / 0
Out[3]:
0 NaN
1 <NA>
dtype: Float64

New behavior:

In [25]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
In [26]: ser / 0
Out[26]: 
0 <NA>
1 <NA>
dtype: Float64

By contrast, with pd.set_option("mode.nan_is_na", False), NaN is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:

Old behavior:

In [2]: ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
In [3]: ser[1]
Out[3]: <NA>

New behavior:

In [27]: pd.set_option("mode.nan_is_na", False)
In [28]: ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
In [29]: ser[1]
Out[29]: np.float64(nan)

If we had passed pd.Int64Dtype() or "int64[pyarrow]" for the dtype in the latter example, this would raise, as a float NaN cannot be held by an integer dtype.

With "mode.nan_is_na" set to False, ser.to_numpy() (and frame.values and np.asarray(obj)) will convert to object dtype if NA entries are present, where before they would coerce to NaN. To retain a float numpy dtype, explicitly pass na_value=np.nan to Series.to_numpy().

Increased minimum version for Python#

pandas 3.0.0 supports Python 3.11 and higher.

Increased minimum versions for dependencies#

Some minimum supported versions of dependencies were updated. The following required dependencies were updated:

Package	New Minimum Version
numpy	1.26.0
tzdata	2023.3

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package	New Minimum Version
adbc-driver-postgresql	1.2.0
adbc-driver-sqlite	1.2.0
mypy (dev)	1.9.0
beautifulsoup4	4.12.3
bottleneck	1.4.2
fastparquet	2024110
fsspec	2024100
hypothesis	6.116.0
gcsfs	2024100
Jinja2	3.1.5
lxml	5.3.0
Jinja2	3.1.3
matplotlib	3.9.3
numba	0.60.0
numexpr	2.10.2
qtpy	2.4.2
openpyxl	3.1.5
psycopg2	2.9.10
pyarrow	13.0.0
pymysql	1.1.1
pyreadstat	1.2.8
pytables	3.10.1
python-calamine	0.3.0
pytz	2024.2
s3fs	2024100
SciPy	1.14.1
sqlalchemy	2.0.36
xarray	2024100
xlsxwriter	3.2.0
zstandard	0.23.0

See Dependencies and Optional dependencies for more.

`pytz` now an optional dependency#

pandas now uses zoneinfo from the standard library as the default timezone implementation when passing a timezone string to various methods. (GH 34916)

Old behavior:

In [1]: ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific")
In [2]: ts.tz
<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>

New behavior:

In [30]: ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific")
In [31]: ts.tz
Out[31]: zoneinfo.ZoneInfo(key='US/Pacific')

pytz timezone objects are still supported when passed directly, but they will no longer be returned by default from string inputs. Moreover, pytz is no longer a required dependency of pandas, but can be installed with the pip extra pip install pandas[timezone].

Additionally, pandas no longer throws pytz exceptions for timezone operations leading to ambiguous or nonexistent times. These cases will now raise a ValueError.

Other API changes#

3rd party py.path objects are no longer explicitly supported in IO methods. Use pathlib.Path objects instead (GH 57091)
read_table()’s parse_dates argument defaults to None to improve consistency with read_csv() (GH 57476)
All classes inheriting from builtin tuple (including types created with collections.namedtuple()) are now hashed and compared as builtin tuple during indexing operations (GH 57922)
Made dtype a required argument in ExtensionArray._from_sequence_of_strings() (GH 56519)
Passing a Series input to json_normalize() will now retain the Series Index, previously output had a new RangeIndex (GH 51452)
Removed Index.sort() which always raised a TypeError. This attribute is not defined and will raise an AttributeError (GH 59283)
Unused dtype argument has been removed from the MultiIndex constructor (GH 60962)
Updated DataFrame.to_excel() so that the output spreadsheet has no styling. Custom styling can still be done using Styler.to_excel() (GH 54154)
pickle and HDF (.h5) files created with Python 2 are no longer explicitly supported (GH 57387)
pickled objects from pandas version less than 1.0.0 are no longer supported (GH 57155)
when comparing the indexes in testing.assert_series_equal(), check_exact defaults to True if an Index is of integer dtypes. (GH 57386)
Index set operations (like union or intersection) will now ignore the dtype of an empty RangeIndex or empty Index with object dtype when determining the dtype of the resulting Index (GH 60797)
IncompatibleFrequency now subclasses TypeError instead of ValueError. As a result, joins with mismatched frequencies now cast to object like other non-comparable joins, and arithmetic with indexes with mismatched frequencies align (GH 55782)
CategoricalIndex.append() no longer attempts to cast different-dtype indexes to the caller’s dtype (GH 41626)
ExtensionDtype.construct_array_type() is now a regular method instead of a classmethod (GH 58663)
Comparison operations between Index and Series now consistently return Series regardless of which object is on the left or right (GH 36759)
Numpy functions like np.isinf that return a bool dtype when called on a Index object now return a bool-dtype Index instead of np.ndarray (GH 52676)

Deprecations#

Copy keyword#

The copy keyword argument in the following methods is deprecated and will be removed in a future version:

Copy-on-Write utilizes a lazy copy mechanism that defers copying the data until necessary. Use .copy to trigger an eager copy. The copy keyword has no effect starting with 3.0, so it can be safely removed from your code.

Other Deprecations#

Deprecated core.internals.api.make_block(), use public APIs instead (GH 56815)
Deprecated DataFrameGroupby.corrwith() (GH 57158)
Deprecated Timestamp.utcfromtimestamp(), use Timestamp.fromtimestamp(ts, "UTC") instead (GH 56680)
Deprecated Timestamp.utcnow(), use Timestamp.now("UTC") instead (GH 56680)
Deprecated pd.core.internals.api.maybe_infer_ndim (GH 40226)
Deprecated allowing constructing or casting to Categorical with non-NA values that are not present in specified dtype.categories (GH 40996)
Deprecated allowing non-keyword arguments in DataFrame.all(), DataFrame.min(), DataFrame.max(), DataFrame.sum(), DataFrame.prod(), DataFrame.mean(), DataFrame.median(), DataFrame.sem(), DataFrame.var(), DataFrame.std(), DataFrame.skew(), DataFrame.kurt(), Series.all(), Series.min(), Series.max(), Series.sum(), Series.prod(), Series.mean(), Series.median(), Series.sem(), Series.var(), Series.std(), Series.skew(), and Series.kurt(). (GH 57087)
Deprecated allowing non-keyword arguments in DataFrame.groupby() and Series.groupby() except by and level. (GH 62102)
Deprecated allowing non-keyword arguments in Series.to_markdown() except buf. (GH 57280)
Deprecated allowing non-keyword arguments in Series.to_string() except buf. (GH 57280)
Deprecated behavior of DataFrameGroupBy.groups() and SeriesGroupBy.groups(), in a future version groups by one element list will return tuple instead of scalar. (GH 58858)
Deprecated behavior of Series.dt.to_pytimedelta(), in a future version this will return a Series containing python datetime.timedelta objects instead of an ndarray of timedelta; this matches the behavior of other Series.dt() properties. (GH 57463)
Deprecated converting object-dtype columns of datetime.datetime objects to datetime64 when writing to stata (GH 56536)
Deprecated lowercase strings d, b and c denoting frequencies in Day, BusinessDay and CustomBusinessDay in favour of D, B and C (GH 58998)
Deprecated lowercase strings w, w-mon, w-tue, etc. denoting frequencies in Week in favour of W, W-MON, W-TUE, etc. (GH 58998)
Deprecated parameter method in DataFrame.reindex_like() / Series.reindex_like() (GH 58667)
Deprecated strings w, d, MIN, MS, US and NS denoting units in Timedelta in favour of W, D, min, ms, us and ns (GH 59051)
Deprecated the arg parameter of Series.map; pass the added func argument instead. (GH 61260)
Deprecated using epoch date format in DataFrame.to_json() and Series.to_json(), use iso instead. (GH 57063)
Deprecated allowing fill_value that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in Series.unstack() and DataFrame.unstack() (GH 12189, GH 53868)
Deprecated allowing fill_value that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in Series.shift() and DataFrame.shift() (GH 53802)
Deprecated slicing on a Series or DataFrame with a DatetimeIndex using a datetime.date object, explicitly cast to Timestamp instead (GH 35830)

Removal of prior version deprecations/changes#

Enforced deprecation of aliases `M`, `Q`, `Y`, etc. in favour of `ME`, `QE`, `YE`, etc. for offsets#

Renamed the following offset aliases (GH 57986):

offset	removed alias	new alias
`MonthEnd`	`M`	`ME`
`BusinessMonthEnd`	`BM`	`BME`
`SemiMonthEnd`	`SM`	`SME`
`CustomBusinessMonthEnd`	`CBM`	`CBME`
`QuarterEnd`	`Q`	`QE`
`BQuarterEnd`	`BQ`	`BQE`
`YearEnd`	`Y`	`YE`
`BYearEnd`	`BY`	`BYE`

Other Removals#

DataFrameGroupBy.idxmin, DataFrameGroupBy.idxmax, SeriesGroupBy.idxmin, and SeriesGroupBy.idxmax will now raise a ValueError when a group has all NA values, or when used with skipna=False and any NA value is encountered (GH 10694, GH 57745)
concat() no longer ignores empty objects when determining output dtypes (GH 39122)
concat() with all-NA entries no longer ignores the dtype of those entries when determining the result dtype (GH 40893)
read_excel(), read_json(), read_html(), and read_xml() no longer accept raw string or byte representation of the data. That type of data must be wrapped in a StringIO or BytesIO (GH 53767)
to_datetime() with a unit specified no longer parses strings into floats, instead parses them the same way as without unit (GH 50735)
DataFrame.groupby() with as_index=False and aggregation methods will no longer exclude from the result the groupings that do not arise from the input (GH 49519)
ExtensionArray._reduce() now requires a keepdims: bool = False parameter in the signature (GH 52788)
Series.dt.to_pydatetime() now returns a Series of datetime.datetime objects (GH 52459)
SeriesGroupBy.agg() no longer pins the name of the group to the input passed to the provided func (GH 51703)
All arguments except name in Index.rename() are now keyword only (GH 56493)
All arguments except the first path-like argument in IO writers are now keyword only (GH 54229)
Changed behavior of Series.__getitem__() and Series.__setitem__() to always treat integer keys as labels, never as positional, consistent with DataFrame behavior (GH 50617)
Changed behavior of Series.__getitem__(), Series.__setitem__(), DataFrame.__getitem__(), DataFrame.__setitem__() with an integer slice on objects with a floating-dtype index. This is now treated as positional indexing (GH 49612)
Disallow a callable argument to Series.iloc() to return a tuple (GH 53769)
Disallow allowing logical operations (||, &, ^) between pandas objects and dtype-less sequences (e.g. list, tuple); wrap the objects in Series, Index, or np.array first instead (GH 52264)
Disallow automatic casting to object in Series logical operations (&, ^, ||) between series with mismatched indexes and dtypes other than object or bool (GH 52538)
Disallow calling Series.replace() or DataFrame.replace() without a value and with non-dict-like to_replace (GH 33302)
Disallow constructing a arrays.SparseArray with scalar data (GH 53039)
Disallow indexing an Index with a boolean indexer of length zero, it now raises ValueError (GH 55820)
Disallow non-standard (np.ndarray, Index, ExtensionArray, or Series) to isin(), unique(), factorize() (GH 52986)
Disallow passing a pandas type to Index.view() (GH 55709)
Disallow units other than "s", "ms", "us", "ns" for datetime64 and timedelta64 dtypes in array() (GH 53817)
Removed "freq" keyword from PeriodArray constructor, use "dtype" instead (GH 52462)
Removed ‘fastpath’ keyword in Categorical constructor (GH 20110)
Removed ‘kind’ keyword in Series.resample() and DataFrame.resample() (GH 58125)
Removed Block, DatetimeTZBlock, ExtensionBlock, create_block_manager_from_blocks from pandas.core.internals and pandas.core.internals.api (GH 55139)
Removed alias arrays.PandasArray for arrays.NumpyExtensionArray (GH 53694)
Removed deprecated "method" and "limit" keywords from Series.replace() and DataFrame.replace() (GH 53492)
Removed extension test classes BaseNoReduceTests, BaseNumericReduceTests, BaseBooleanReduceTests (GH 54663)
Removed the "closed" and "normalize" keywords in DatetimeIndex.__new__() (GH 52628)
Removed the deprecated delim_whitespace keyword in read_csv() and read_table(), use sep=r"\s+" instead (GH 55569)
Require SparseDtype.fill_value() to be a valid value for the SparseDtype.subtype() (GH 53043)
Stopped automatically casting non-datetimelike values (mainly strings) in Series.isin() and Index.isin() with datetime64, timedelta64, and PeriodDtype dtypes (GH 53111)
Stopped performing dtype inference in Index, Series and DataFrame constructors when given a pandas object (Series, Index, ExtensionArray), call .infer_objects on the input to keep the current behavior (GH 56012)
Stopped performing dtype inference when setting a Index into a DataFrame (GH 56102)
Stopped performing dtype inference with in Index.insert() with object-dtype index; this often affects the index/columns that result when setting new entries into an empty Series or DataFrame (GH 51363)
Removed the "closed" and "unit" keywords in TimedeltaIndex.__new__() (GH 52628, GH 55499)
All arguments in Index.sort_values() are now keyword only (GH 56493)
All arguments in Series.to_dict() are now keyword only (GH 56493)
Changed the default value of na_action in Categorical.map() to None (GH 51645)
Changed the default value of observed in DataFrame.groupby() and Series.groupby() to True (GH 51811)
Enforce deprecation in testing.assert_series_equal() and testing.assert_frame_equal() with object dtype and mismatched null-like values, which are now considered not-equal (GH 18463)
Enforce banning of upcasting in in-place setitem-like operations (GH 59007) (see PDEP6)
Enforced deprecation all and any reductions with datetime64, DatetimeTZDtype, and PeriodDtype dtypes (GH 58029)
Enforced deprecation disallowing float "periods" in date_range(), period_range(), timedelta_range(), interval_range(), (GH 56036)
Enforced deprecation disallowing parsing datetimes with mixed time zones unless user passes utc=True to to_datetime() (GH 57275)
Enforced deprecation in Series.value_counts() and Index.value_counts() with object dtype performing dtype inference on the .index of the result (GH 56161)
Enforced deprecation of DataFrameGroupBy.get_group() and SeriesGroupBy.get_group() allowing the name argument to be a non-tuple when grouping by a list of length 1 (GH 54155)
Enforced deprecation of Series.interpolate() and DataFrame.interpolate() for object-dtype (GH 57820)
Enforced deprecation of offsets.Tick.delta(), use pd.Timedelta(obj) instead (GH 55498)
Enforced deprecation of axis=None acting the same as axis=0 in the DataFrame reductions sum, prod, std, var, and sem, passing axis=None will now reduce over both axes; this is particularly the case when doing e.g. numpy.sum(df) (GH 21597)
Enforced deprecation of core.internals member DatetimeTZBlock (GH 58467)
Enforced deprecation of date_parser in read_csv(), read_table(), read_fwf(), and read_excel() in favour of date_format (GH 50601)
Enforced deprecation of keep_date_col keyword in read_csv() (GH 55569)
Enforced deprecation of quantile keyword in Rolling.quantile() and Expanding.quantile(), renamed to q instead. (GH 52550)
Enforced deprecation of argument infer_datetime_format in read_csv(), as a strict version of it is now the default (GH 48621)
Enforced deprecation of combining parsed datetime columns in read_csv() in parse_dates (GH 55569)
Enforced deprecation of non-standard (np.ndarray, ExtensionArray, Index, or Series) argument to api.extensions.take() (GH 52981)
Enforced deprecation of parsing system timezone strings to tzlocal, which depended on system timezone, pass the ‘tz’ keyword instead (GH 50791)
Enforced deprecation of passing a dictionary to SeriesGroupBy.agg() (GH 52268)
Enforced deprecation of string AS denoting frequency in YearBegin and strings AS-DEC, AS-JAN, etc. denoting annual frequencies with various fiscal year starts (GH 57793)
Enforced deprecation of string A denoting frequency in YearEnd and strings A-DEC, A-JAN, etc. denoting annual frequencies with various fiscal year ends (GH 57699)
Enforced deprecation of string BAS denoting frequency in BYearBegin and strings BAS-DEC, BAS-JAN, etc. denoting annual frequencies with various fiscal year starts (GH 57793)
Enforced deprecation of string BA denoting frequency in BYearEnd and strings BA-DEC, BA-JAN, etc. denoting annual frequencies with various fiscal year ends (GH 57793)
Enforced deprecation of strings H, BH, and CBH denoting frequencies in Hour, BusinessHour, CustomBusinessHour (GH 59143)
Enforced deprecation of strings H, BH, and CBH denoting units in Timedelta (GH 59143)
Enforced deprecation of strings T, L, U, and N denoting frequencies in Minute, Milli, Micro, Nano (GH 57627)
Enforced deprecation of strings T, L, U, and N denoting units in Timedelta (GH 57627)
Enforced deprecation of the behavior of concat() when len(keys) != len(objs) would truncate to the shorter of the two. Now this raises a ValueError (GH 43485)
Enforced deprecation of the behavior of DataFrame.replace() and Series.replace() with CategoricalDtype that would introduce new categories. (GH 58270)
Enforced deprecation of the behavior of Series.argsort() in the presence of NA values (GH 58232)
Enforced deprecation of values "pad", "ffill", "bfill", and "backfill" for Series.interpolate() and DataFrame.interpolate() (GH 57869)
Enforced deprecation removing Categorical.to_list(), use obj.tolist() instead (GH 51254)
Enforced silent-downcasting deprecation for all relevant methods (GH 54710)
In DataFrame.stack(), the default value of future_stack is now True; specifying False will raise a FutureWarning (GH 55448)
Iterating over a DataFrameGroupBy or SeriesGroupBy will return tuples of length 1 for the groups when grouping by level a list of length 1 (GH 50064)
Methods apply, agg, and transform will no longer replace NumPy functions (e.g. np.sum) and built-in functions (e.g. min) with the equivalent pandas implementation; use string aliases (e.g. "sum" and "min") if you desire to use the pandas implementation (GH 53974)
Passing both freq and fill_value in DataFrame.shift() and Series.shift() and DataFrameGroupBy.shift() now raises a ValueError (GH 54818)
Removed DataFrameGroupBy.quantile() and SeriesGroupBy.quantile() supporting bool dtype (GH 53975)
Removed DateOffset.is_anchored() and offsets.Tick.is_anchored() (GH 56594)
Removed DataFrame.applymap, Styler.applymap and Styler.applymap_index (GH 52364)
Removed DataFrame.bool and Series.bool (GH 51756)
Removed DataFrame.first and DataFrame.last (GH 53710)
Removed DataFrame.swapaxes and Series.swapaxes (GH 51946)
Removed DataFrameGroupBy.grouper and SeriesGroupBy.grouper (GH 56521)
Removed DataFrameGroupby.fillna and SeriesGroupBy.fillna` (GH 55719)
Removed Index.format, use Index.astype() with str or Index.map() with a formatter function instead (GH 55439)
Removed Resample.fillna (GH 55719)
Removed Series.__int__ and Series.__float__. Call int(Series.iloc[0]) or float(Series.iloc[0]) instead. (GH 51131)
Removed Series.ravel (GH 56053)
Removed Series.view (GH 56054)
Removed StataReader.close (GH 49228)
Removed _data from DataFrame, Series, arrays.ArrowExtensionArray (GH 52003)
Removed axis argument from DataFrame.groupby(), Series.groupby(), DataFrame.rolling(), Series.rolling(), DataFrame.resample(), and Series.resample() (GH 51203)
Removed axis argument from all groupby operations (GH 50405)
Removed convert_dtype from Series.apply() (GH 52257)
Removed method, limit fill_axis and broadcast_axis keywords from DataFrame.align() (GH 51968)
Removed pandas.api.types.is_interval and pandas.api.types.is_period, use isinstance(obj, pd.Interval) and isinstance(obj, pd.Period) instead (GH 55264)
Removed pandas.io.sql.execute (GH 50185)
Removed pandas.value_counts, use Series.value_counts() instead (GH 53493)
Removed read_gbq and DataFrame.to_gbq. Use pandas_gbq.read_gbq and pandas_gbq.to_gbq instead https://pandas-gbq.readthedocs.io/en/latest/api.html (GH 55525)
Removed use_nullable_dtypes from read_parquet() (GH 51853)
Removed year, month, quarter, day, hour, minute, and second keywords in the PeriodIndex constructor, use PeriodIndex.from_fields() instead (GH 55960)
Removed argument limit from DataFrame.pct_change(), Series.pct_change(), DataFrameGroupBy.pct_change(), and SeriesGroupBy.pct_change(); the argument method must be set to None and will be removed in a future version of pandas (GH 53520)
Removed deprecated argument obj in DataFrameGroupBy.get_group() and SeriesGroupBy.get_group() (GH 53545)
Removed deprecated behavior of Series.agg() using Series.apply() (GH 53325)
Removed deprecated keyword method on Series.fillna(), DataFrame.fillna() (GH 57760)
Removed option mode.use_inf_as_na, convert inf entries to NaN before instead (GH 51684)
Removed support for DataFrame in DataFrame.from_records`(:issue:`51697())
Removed support for errors="ignore" in to_datetime(), to_timedelta() and to_numeric() (GH 55734)
Removed support for slice in DataFrame.take() (GH 51539)
Removed the ArrayManager (GH 55043)
Removed the fastpath argument from the Series constructor (GH 55466)
Removed the is_boolean, is_integer, is_floating, holds_integer, is_numeric, is_categorical, is_object, and is_interval attributes of Index (GH 50042)
Removed the ordinal keyword in PeriodIndex, use PeriodIndex.from_ordinals() instead (GH 55960)
Removed unused arguments *args and **kwargs in Resampler methods (GH 50977)
Unrecognized timezones when parsing strings to datetimes now raises a ValueError (GH 51477)
Removed the Grouper attributes ax, groups, indexer, and obj (GH 51206, GH 51182)
Removed deprecated keyword verbose on read_csv() and read_table() (GH 56556)
Removed the method keyword in ExtensionArray.fillna, implement ExtensionArray._pad_or_backfill instead (GH 53621)
Removed the attribute dtypes from DataFrameGroupBy (GH 51997)
Enforced deprecation of argmin, argmax, idxmin, and idxmax returning a result when skipna=False and an NA value is encountered or all values are NA values; these operations will now raise in such cases (GH 33941, GH 51276)
Removed specifying include_groups=True in DataFrameGroupBy.apply and Resampler.apply (GH 7155)

Performance improvements#

Eliminated circular reference in to original pandas object in accessor attributes (e.g. Series.str). However, accessor instantiation is no longer cached (GH 47667, GH 41357)
Categorical.categories returns a RangeIndex columns instead of an Index if the constructed values was a range. (GH 57787)
DataFrame returns a RangeIndex columns when possible when data is a dict (GH 57943)
Series returns a RangeIndex index when possible when data is a dict (GH 58118)
concat() returns a RangeIndex column when possible when objs contains Series and DataFrame and axis=0 (GH 58119)
concat() returns a RangeIndex level in the MultiIndex result when keys is a range or RangeIndex (GH 57542)
RangeIndex.append() returns a RangeIndex instead of a Index when appending values that could continue the RangeIndex (GH 57467)
Series.nlargest() has improved performance when there are duplicate values in the index (GH 55767)
Series.str.extract() returns a RangeIndex columns instead of an Index column when possible (GH 57542)
Series.str.partition() with ArrowDtype returns a RangeIndex columns instead of an Index column when possible (GH 57768)
Performance improvement in DataFrame when data is a dict and columns is specified (GH 24368)
Performance improvement in MultiIndex when setting MultiIndex.names doesn’t invalidate all cached operations (GH 59578)
Performance improvement in DataFrame.join() for sorted but non-unique indexes (GH 56941)
Performance improvement in DataFrame.join() when left and/or right are non-unique and how is "left", "right", or "inner" (GH 56817)
Performance improvement in DataFrame.join() with how="left" or how="right" and sort=True (GH 56919)
Performance improvement in DataFrame.to_csv() when index=False (GH 59312)
Performance improvement in DataFrameGroupBy.ffill(), DataFrameGroupBy.bfill(), SeriesGroupBy.ffill(), and SeriesGroupBy.bfill() (GH 56902)
Performance improvement in Index.join() by propagating cached attributes in cases where the result matches one of the inputs (GH 57023)
Performance improvement in Index.take() when indices is a full range indexer from zero to length of index (GH 56806)
Performance improvement in Index.to_frame() returning a RangeIndex columns of a Index when possible. (GH 58018)
Performance improvement in MultiIndex._engine() to use smaller dtypes if possible (GH 58411)
Performance improvement in MultiIndex.equals() for equal length indexes (GH 56990)
Performance improvement in MultiIndex.memory_usage() to ignore the index engine when it isn’t already cached. (GH 58385)
Performance improvement in RangeIndex.__getitem__() with a boolean mask or integers returning a RangeIndex instead of a Index when possible. (GH 57588)
Performance improvement in RangeIndex.append() when appending the same index (GH 57252)
Performance improvement in RangeIndex.argmin() and RangeIndex.argmax() (GH 57823)
Performance improvement in RangeIndex.insert() returning a RangeIndex instead of a Index when the RangeIndex is empty. (GH 57833)
Performance improvement in RangeIndex.round() returning a RangeIndex instead of a Index when possible. (GH 57824)
Performance improvement in RangeIndex.searchsorted() (GH 58376)
Performance improvement in RangeIndex.to_numpy() when specifying an na_value (GH 58376)
Performance improvement in RangeIndex.value_counts() (GH 58376)
Performance improvement in RangeIndex.join() returning a RangeIndex instead of a Index when possible. (GH 57651, GH 57752)
Performance improvement in RangeIndex.reindex() returning a RangeIndex instead of a Index when possible. (GH 57647, GH 57752)
Performance improvement in RangeIndex.take() returning a RangeIndex instead of a Index when possible. (GH 57445, GH 57752)
Performance improvement in merge() if hash-join can be used (GH 57970)
Performance improvement in CategoricalDtype.update_dtype() when dtype is a CategoricalDtype with non None categories and ordered (GH 59647)
Performance improvement in DataFrame.__getitem__() when key is a DataFrame with many columns (GH 61010)
Performance improvement in DataFrame.astype() when converting to extension floating dtypes, e.g. "Float64" (GH 60066)
Performance improvement in DataFrame.stack() when using future_stack=True and the DataFrame does not have a MultiIndex (GH 58391)
Performance improvement in DataFrame.where() when cond is a DataFrame with many columns (GH 61010)
Performance improvement in to_hdf() avoid unnecessary reopenings of the HDF5 file to speedup data addition to files with a very large number of groups . (GH 58248)
Performance improvement in DataFrameGroupBy.__len__ and SeriesGroupBy.__len__ (GH 57595)
Performance improvement in indexing operations for string dtypes (GH 56997)
Performance improvement in unary methods on a RangeIndex returning a RangeIndex instead of a Index when possible. (GH 57825)

Bug fixes#

Categorical#

Bug in Series.apply() where nan was ignored for CategoricalDtype (GH 59938)
Bug in Categorical.astype() where copy=False would still trigger a copy of the codes (GH 62000)
Bug in DataFrame.pivot() and DataFrame.set_index() raising an ArrowNotImplementedError for columns with pyarrow dictionary dtype (GH 53051)
Bug in Series.convert_dtypes() with dtype_backend="pyarrow" where empty CategoricalDtype Series raised an error or got converted to null[pyarrow] (GH 59934)

Datetimelike#

Bug in is_year_start where a DateTimeIndex constructed via a date_range with frequency ‘MS’ wouldn’t have the correct year or quarter start attributes (GH 57377)
Bug in DataFrame raising ValueError when dtype is timedelta64 and data is a list containing None (GH 60064)
Bug in Timestamp constructor failing to raise when tz=None is explicitly specified in conjunction with timezone-aware tzinfo or data (GH 48688)
Bug in Timestamp constructor failing to raise when given a np.datetime64 object with non-standard unit (GH 25611)
Bug in date_range() where the last valid timestamp would sometimes not be produced (GH 56134)
Bug in date_range() where using a negative frequency value would not include all points between the start and end values (GH 56147)
Bug in tseries.api.guess_datetime_format() would fail to infer time format when "%Y" == "%H%M" (GH 57452)
Bug in tseries.frequencies.to_offset() would fail to parse frequency strings starting with "LWOM" (GH 59218)
Bug in DataFrame.fillna() raising an AssertionError instead of OutOfBoundsDatetime when filling a datetime64[ns] column with an out-of-bounds timestamp. Now correctly raises OutOfBoundsDatetime. (GH 61208)
Bug in DataFrame.min() and DataFrame.max() casting datetime64 and timedelta64 columns to float64 and losing precision (GH 60850)
Bug in Dataframe.agg() with df with missing values resulting in IndexError (GH 58810)
Bug in DatetimeIndex.is_year_start() and DatetimeIndex.is_quarter_start() does not raise on Custom business days frequencies bigger then "1C" (GH 58664)
Bug in DatetimeIndex.is_year_start() and DatetimeIndex.is_quarter_start() returning False on double-digit frequencies (GH 58523)
Bug in DatetimeIndex.union() and DatetimeIndex.intersection() when unit was non-nanosecond (GH 59036)
Bug in Index.union() with a pyarrow timestamp dtype incorrectly returning object dtype (GH 58421)
Bug in Series.dt.microsecond() producing incorrect results for pyarrow backed Series. (GH 59154)
Bug in Timestamp.normalize() and DatetimeArray.normalize() returning incorrect results instead of raising on integer overflow for very small (distant past) values (GH 60583)
Bug in Timestamp.replace() failing to update unit attribute when replacement introduces non-zero nanosecond or microsecond (GH 57749)
Bug in to_datetime() not respecting dayfirst if an uncommon date string was passed. (GH 58859)
Bug in to_datetime() on float array with missing values throwing FloatingPointError (GH 58419)
Bug in to_datetime() on float32 df with year, month, day etc. columns leads to precision issues and incorrect result. (GH 60506)
Bug in to_datetime() reports incorrect index in case of any failure scenario. (GH 58298)
Bug in to_datetime() with format="ISO8601" and utc=True where naive timestamps incorrectly inherited timezone offset from previous timestamps in a series. (GH 61389)
Bug in to_datetime() wrongly converts when arg is a np.datetime64 object with unit of ps. (GH 60341)
Bug in comparison between objects with np.datetime64 dtype and timestamp[pyarrow] dtypes incorrectly raising TypeError (GH 60937)
Bug in comparison between objects with pyarrow date dtype and timestamp[pyarrow] or np.datetime64 dtype failing to consider these as non-comparable (GH 62157)
Bug in constructing arrays with ArrowDtype with timestamp type incorrectly allowing Decimal("NaN") (GH 61773)
Bug in constructing arrays with a timezone-aware ArrowDtype from timezone-naive datetime objects incorrectly treating those as UTC times instead of wall times like DatetimeTZDtype (GH 61775)
Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond datetime64, timedelta64 or DatetimeTZDtype incorrectly truncating those scalars (GH 56410)

Timedelta#

Accuracy improvement in Timedelta.to_pytimedelta() to round microseconds consistently for large nanosecond based Timedelta (GH 57841)
Bug in Timedelta constructor failing to raise when passed an invalid keyword (GH 53801)
Bug in DataFrame.cumsum() which was raising IndexError if dtype is timedelta64[ns] (GH 57956)
Bug in multiplication operations with timedelta64 dtype failing to raise TypeError when multiplying by bool objects or dtypes (GH 58054)

Timezones#

Bug in DatetimeIndex.union(), DatetimeIndex.intersection(), and DatetimeIndex.symmetric_difference() changing timezone to UTC when merging two DatetimeIndex objects with the same timezone but different units (GH 60080)
Bug in Series.dt.tz_localize() with a timezone-aware ArrowDtype incorrectly converting to UTC when tz=None (GH 61780)
Fixed bug in date_range() where tz-aware endpoints with calendar offsets (e.g. "MS") failed on DST fall-back. These now respect ambiguous/ nonexistent. (GH 52908)

Numeric#

Bug in api.types.infer_dtype() returning "mixed" for complex and pd.NA mix (GH 61976)
Bug in api.types.infer_dtype() returning "mixed-integer-float" for float and pd.NA mix (GH 61621)
Bug in DataFrame.corr() where numerical precision errors resulted in correlations above 1.0 (GH 61120)
Bug in DataFrame.cov() raises a TypeError instead of returning potentially incorrect results or other errors (GH 53115)
Bug in DataFrame.quantile() where the column type was not preserved when numeric_only=True with a list-like q produced an empty result (GH 59035)
Bug in Series.dot() returning object dtype for ArrowDtype and nullable-dtype data (GH 61375)
Bug in Series.std() and Series.var() when using complex-valued data (GH 61645)
Bug in np.matmul with Index inputs raising a TypeError (GH 57079)

Conversion#

Bug in DataFrame.astype() not casting values for Arrow-based dictionary dtype correctly (GH 58479)
Bug in DataFrame.update() bool dtype being converted to object (GH 55509)
Bug in Series.astype() might modify read-only array inplace when casting to a string dtype (GH 57212)
Bug in Series.convert_dtypes() and DataFrame.convert_dtypes() removing timezone information for objects with ArrowDtype (GH 60237)
Bug in Series.reindex() not maintaining float32 type when a reindex introduces a missing value (GH 45857)
Bug in to_datetime() and to_timedelta() with input None returning None instead of NaT, inconsistent with other conversion methods (GH 23055)

Strings#

Bug in Series.value_counts() would not respect sort=False for series having string dtype (GH 55224)

Interval#

Index.is_monotonic_decreasing(), Index.is_monotonic_increasing(), and Index.is_unique() could incorrectly be False for an Index created from a slice of another Index. (GH 57911)
Bug in interval_range() where start and end numeric types were always cast to 64 bit (GH 57268)

Indexing#

Bug in DataFrame.__getitem__() returning modified columns when called with slice in Python 3.12 (GH 57500)
Bug in DataFrame.__getitem__() when slicing a DataFrame with many rows raised an OverflowError (GH 59531)
Bug in DataFrame.from_records() throwing a ValueError when passed an empty list in index (GH 58594)
Bug in DataFrame.loc() and DataFrame.iloc() returning incorrect dtype when selecting from a DataFrame with mixed data types. (GH 60600)
Bug in DataFrame.loc() with inconsistent behavior of loc-set with 2 given indexes to Series (GH 59933)
Bug in Index.equals() when comparing between Series with string dtype Index (GH 61099)
Bug in Index.get_indexer() and similar methods when NaN is located at or after position 128 (GH 58924)
Bug in MultiIndex.insert() when a new value inserted to a datetime-like level gets cast to NaT and fails indexing (GH 60388)
Bug in Series.__setitem__() when assigning boolean series with boolean indexer will raise LossySetitemError (GH 57338)
Bug in printing Index.names and MultiIndex.levels would not escape single quotes (GH 60190)
Bug in reindexing of DataFrame with PeriodDtype columns in case of consolidated block (GH 60980, GH 60273)
Bug in DataFrame.loc.__getitem__() and DataFrame.iloc.__getitem__() with a CategoricalDtype column with integer categories raising when trying to index a row containing a NaN entry (GH 58954)
Bug in Index.__getitem__() incorrectly raising with a 0-dim np.ndarray key (GH 55601)
Bug in adding new rows with DataFrame.loc.__setitem__() or Series.loc.__setitem__ which failed to retain dtype on the object’s index in some cases (GH 41626)
Bug in indexing on a DatetimeIndex with a timestamp[pyarrow] dtype or on a TimedeltaIndex with a duration[pyarrow] dtype (GH 62277)

Missing#

Bug in DataFrame.fillna() and Series.fillna() that would ignore the limit argument on ExtensionArray dtypes (GH 58001)
Bug in NA.__and__(), NA.__or__() and NA.__xor__() when operating with np.bool_ objects (GH 58427)
Bug in divmod between NA and Int64 dtype objects (GH 62196)

MultiIndex#

DataFrame.loc() with axis=0 and MultiIndex when setting a value adds extra columns (GH 58116)
DataFrame.melt() would not accept multiple names in var_name when the columns were a MultiIndex (GH 58033)
MultiIndex.insert() would not insert NA value correctly at unified location of index -1 (GH 59003)
MultiIndex.get_level_values() accessing a DatetimeIndex does not carry the frequency attribute along (GH 58327, GH 57949)
Bug in DataFrame arithmetic operations in case of unaligned MultiIndex columns (GH 60498)
Bug in DataFrame arithmetic operations with Series in case of unaligned MultiIndex (GH 61009)
Bug in MultiIndex.from_tuples() causing wrong output with input of type tuples having NaN values (GH 60695, GH 60988)
Bug in DataFrame.__setitem__() where column alignment logic would reindex the assigned value with an empty index, incorrectly setting all values to NaN.(GH 61841)
Bug in DataFrame.reindex() and Series.reindex() where reindexing Index to a MultiIndex would incorrectly set all values to NaN.(GH 60923)

I/O#

Bug in DataFrame and Series repr of collections.abc.Mapping elements. (GH 57915)
Fix bug in on_bad_lines callable when returning too many fields: now emits ParserWarning and truncates extra fields regardless of index_col (GH 61837)
Bug in DataFrame.to_json() when "index" was a value in the DataFrame.column and Index.name was None. Now, this will fail with a ValueError (GH 58925)
Bug in io.common.is_fsspec_url() not recognizing chained fsspec URLs (GH 48978)
Bug in DataFrame._repr_html_() which ignored the "display.float_format" option (GH 59876)
Bug in DataFrame.from_records() ignoring columns and index parameters when data is an empty iterator and nrows=0. (GH 61140)
Bug in DataFrame.from_records() where columns parameter with numpy structured array was not reordering and filtering out the columns (GH 59717)
Bug in DataFrame.to_dict() raises unnecessary UserWarning when columns are not unique and orient='tight'. (GH 58281)
Bug in DataFrame.to_excel() when writing empty DataFrame with MultiIndex on both axes (GH 57696)
Bug in DataFrame.to_excel() where the MultiIndex index with a period level was not a date (GH 60099)
Bug in DataFrame.to_stata() when exporting a column containing both long strings (Stata strL) and pd.NA values (GH 23633)
Bug in DataFrame.to_stata() when input encoded length and normal length are mismatched (GH 61583)
Bug in DataFrame.to_stata() when writing DataFrame and byteorder=`big`. (GH 58969)
Bug in DataFrame.to_stata() when writing more than 32,000 value labels. (GH 60107)
Bug in DataFrame.to_string() that raised StopIteration with nested DataFrames. (GH 16098)
Bug in HDFStore.get() was failing to save data of dtype datetime64[s] correctly (GH 59004)
Bug in HDFStore.select() causing queries on categorical string columns to return unexpected results (GH 57608)
Bug in MultiIndex.factorize() incorrectly raising on length-0 indexes (GH 57517)
Bug in read_csv() causing segmentation fault when encoding_errors is not a string. (GH 59059)
Bug in read_csv() raising TypeError when index_col is specified and na_values is a dict containing the key None. (GH 57547)
Bug in read_csv() raising TypeError when nrows and iterator are specified without specifying a chunksize. (GH 59079)
Bug in read_csv() where the order of the na_values makes an inconsistency when na_values is a list non-string values. (GH 59303)
Bug in read_csv() with engine="pyarrow" and dtype="Int64" losing precision (GH 56136)
Bug in read_excel() raising ValueError when passing array of boolean values when dtype="boolean". (GH 58159)
Bug in read_html() where rowspan in header row causes incorrect conversion to DataFrame. (GH 60210)
Bug in read_json() ignoring the given dtype when engine="pyarrow" (GH 59516)
Bug in read_json() not validating the typ argument to not be exactly "frame" or "series" (GH 59124)
Bug in read_json() where extreme value integers in string format were incorrectly parsed as a different integer number (GH 20608)
Bug in read_stata() raising KeyError when input file is stored in big-endian format and contains strL data. (GH 58638)
Bug in read_stata() where extreme value integers were incorrectly interpreted as missing for format versions 111 and prior (GH 58130)
Bug in read_stata() where the missing code for double was not recognised for format versions 105 and prior (GH 58149)
Bug in set_option() where setting the pandas option display.html.use_mathjax to False has no effect (GH 59884)
Bug in to_csv() where quotechar` is not escaped when escapechar is not None (GH 61407)
Bug in to_excel() where MultiIndex columns would be merged to a single row when merge_cells=False is passed (GH 60274)

Period#

Fixed error message when passing invalid period alias to PeriodIndex.to_timestamp() (GH 58974)

Plotting#

Bug in DataFrameGroupBy.boxplot() failed when there were multiple groupings (GH 14701)
Bug in DataFrame.plot.bar() when subplots and stacked=True are used in conjunction which causes incorrect stacking. (GH 61018)
Bug in DataFrame.plot.bar() with stacked=True where labels on stacked bars with zero-height segments were incorrectly positioned at the base instead of the label position of the previous segment (GH 59429)
Bug in DataFrame.plot.line() raising ValueError when set both color and a dict style (GH 59461)
Bug in DataFrame.plot() that causes a shift to the right when the frequency multiplier is greater than one. (GH 57587)
Bug in DataFrame.plot() where title would require extra titles when plotting more than one column per subplot. (GH 61019)
Bug in Series.plot() preventing a line and bar from being aligned on the same plot (GH 61161)
Bug in Series.plot() preventing a line and scatter plot from being aligned (GH 61005)
Bug in Series.plot() with kind="pie" with ArrowDtype (GH 59192)

Groupby/resample/rolling#

Bug in DataFrameGroupBy.__len__() and SeriesGroupBy.__len__() would raise when the grouping contained NA values and dropna=False (GH 58644)
Bug in DataFrameGroupBy.any() that returned True for groups where all Timedelta values are NaT. (GH 59712)
Bug in DataFrameGroupBy.groups() and SeriesGroupBy.groups() would fail when the groups were Categorical with an NA value (GH 61356)
Bug in DataFrameGroupBy.groups() and SeriesGroupby.groups() that would not respect groupby argument dropna (GH 55919)
Bug in DataFrameGroupBy.median() where nat values gave an incorrect result. (GH 57926)
Bug in DataFrameGroupBy.quantile() when interpolation="nearest" is inconsistent with DataFrame.quantile() (GH 47942)
Bug in Resampler.interpolate() on a DataFrame with non-uniform sampling and/or indices not aligning with the resulting resampled index would result in wrong interpolation (GH 21351)
Bug in Series.rolling() when used with a BaseIndexer subclass and computing min/max (GH 46726)
Bug in DataFrame.ewm() and Series.ewm() when passed times and aggregation functions other than mean (GH 51695)
Bug in DataFrame.resample() and Series.resample() were not keeping the index name when the index had ArrowDtype timestamp dtype (GH 61222)
Bug in DataFrame.resample() changing index type to MultiIndex when the dataframe is empty and using an upsample method (GH 55572)
Bug in DataFrameGroupBy.agg() and SeriesGroupBy.agg() that was returning numpy dtype values when input values are pyarrow dtype values, instead of returning pyarrow dtype values. (GH 53030)
Bug in DataFrameGroupBy.agg() that raises AttributeError when there is dictionary input and duplicated columns, instead of returning a DataFrame with the aggregation of all duplicate columns. (GH 55041)
Bug in DataFrameGroupBy.agg() where applying a user-defined function to an empty DataFrame returned a Series instead of an empty DataFrame. (GH 61503)
Bug in DataFrameGroupBy.apply() and SeriesGroupBy.apply() for empty data frame with group_keys=False still creating output index using group keys. (GH 60471)
Bug in DataFrameGroupBy.apply() and SeriesGroupBy.apply() not preserving _metadata attributes from subclassed DataFrames and Series (GH 62134)
Bug in DataFrameGroupBy.apply() that was returning a completely empty DataFrame when all return values of func were None instead of returning an empty DataFrame with the original columns and dtypes. (GH 57775)
Bug in DataFrameGroupBy.apply() with as_index=False that was returning MultiIndex instead of returning Index. (GH 58291)
Bug in DataFrameGroupBy.cumsum() and DataFrameGroupBy.cumprod() where numeric_only parameter was passed indirectly through kwargs instead of passing directly. (GH 58811)
Bug in DataFrameGroupBy.cumsum() where it did not return the correct dtype when the label contained None. (GH 58811)
Bug in DataFrameGroupby.transform() and SeriesGroupby.transform() with a reducer and observed=False that coerces dtype to float when there are unobserved categories. (GH 55326)
Bug in Rolling.apply() for method="table" where column order was not being respected due to the columns getting sorted by default. (GH 59666)
Bug in Rolling.apply() where the applied function could be called on fewer than min_period periods if method="table". (GH 58868)
Bug in Series.resample() could raise when the date range ended shortly before a non-existent time. (GH 58380)

Reshaping#

Bug in qcut() where values at the quantile boundaries could be incorrectly assigned (GH 59355)
Bug in DataFrame.combine_first() not preserving the column order (GH 60427)
Bug in DataFrame.explode() producing incorrect result for pyarrow.large_list type (GH 61091)
Bug in DataFrame.join() inconsistently setting result index name (GH 55815)
Bug in DataFrame.join() when a DataFrame with a MultiIndex would raise an AssertionError when MultiIndex.names contained None. (GH 58721)
Bug in DataFrame.merge() where merging on a column containing only NaN values resulted in an out-of-bounds array access (GH 59421)
Bug in DataFrame.unstack() producing incorrect results when sort=False (GH 54987, GH 55516)
Bug in DataFrame.unstack() raising an error with indexes containing NaN with sort=False (GH 61221)
Bug in DataFrame.merge() when merging two DataFrame on intc or uintc types on Windows (GH 60091, GH 58713)
Bug in DataFrame.pivot_table() incorrectly subaggregating results when called without an index argument (GH 58722)
Bug in DataFrame.pivot_table() incorrectly ignoring the values argument when also supplied to the index or columns parameters (GH 57876, GH 61292)
Bug in DataFrame.pivot_table() where margins=True did not correctly include groups with NaN values in the index or columns when dropna=False was explicitly passed. (GH 61509)
Bug in DataFrame.stack() with the new implementation where ValueError is raised when level=[] (GH 60740)
Bug in DataFrame.unstack() producing incorrect results when manipulating empty DataFrame with an ExtentionDtype (GH 59123)
Bug in concat() where concatenating DataFrame and Series with ignore_index = True drops the series name (GH 60723, GH 56257)
Bug in melt() where calling with duplicate column names in id_vars raised a misleading AttributeError (GH 61475)
Bug in DataFrame.merge() where user-provided suffixes could result in duplicate column names if the resulting names matched existing columns. Now raises a MergeError in such cases. (GH 61402)
Bug in DataFrame.merge() with CategoricalDtype columns incorrectly raising RecursionError (GH 56376)
Bug in DataFrame.merge() with a float32 index incorrectly casting the index to float64 (GH 41626)

Sparse#

Bug in SparseDtype for equal comparison with na fill value. (GH 54770)
Bug in DataFrame.sparse.from_spmatrix() which hard coded an invalid fill_value for certain subtypes. (GH 59063)
Bug in DataFrame.sparse.to_dense() which ignored subclassing and always returned an instance of DataFrame (GH 59913)

ExtensionArray#

Bug in Categorical when constructing with an Index with ArrowDtype (GH 60563)
Bug in arrays.ArrowExtensionArray.__setitem__() which caused wrong behavior when using an integer array with repeated values as a key (GH 58530)
Bug in ArrowExtensionArray.factorize() where NA values were dropped when input was dictionary-encoded even when dropna was set to False(GH 60567)
Bug in api.types.is_datetime64_any_dtype() where a custom ExtensionDtype would return False for array-likes (GH 57055)
Bug in comparison between object with ArrowDtype and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-False (for ==) or all-True (for !=) (GH 59505)
Bug in constructing pandas data structures when passing into dtype a string of the type followed by [pyarrow] while PyArrow is not installed would raise NameError rather than ImportError (GH 57928)
Bug in various DataFrame reductions for pyarrow temporal dtypes returning incorrect dtype when result was null (GH 59234)

Styler#

Bug in Styler.to_latex() where styling column headers when combined with a hidden index or hidden index-levels is fixed.

Other#

Bug in DataFrame when passing a dict with a NA scalar and columns that would always return np.nan (GH 57205)
Bug in Series ignoring errors when trying to convert Series input data to the given dtype (GH 60728)
Bug in eval() on ExtensionArray on including division / failed with a TypeError. (GH 58748)
Bug in eval() where method calls on binary operations like (x + y).dropna() would raise AttributeError: 'BinOp' object has no attribute 'value' (GH 61175)
Bug in eval() where the names of the Series were not preserved when using engine="numexpr". (GH 10239)
Bug in eval() with engine="numexpr" returning unexpected result for float division. (GH 59736)
Bug in to_numeric() raising TypeError when arg is a Timedelta or Timestamp scalar. (GH 59944)
Bug in unique() on Index not always returning Index (GH 57043)
Bug in DataFrame.apply() raising RecursionError when passing func=list[int]. (GH 61565)
Bug in DataFrame.apply() where passing engine="numba" ignored args passed to the applied function (GH 58712)
Bug in DataFrame.eval() and DataFrame.query() which caused an exception when using NumPy attributes via @ notation, e.g., df.eval("@np.floor(a)"). (GH 58041)
Bug in DataFrame.eval() and DataFrame.query() which did not allow to use tan function. (GH 55091)
Bug in DataFrame.query() where using duplicate column names led to a TypeError. (GH 59950)
Bug in DataFrame.query() which raised an exception or produced incorrect results when expressions contained backtick-quoted column names containing the hash character #, backticks, or characters that fall outside the ASCII range (U+0001..U+007F). (GH 59285) (GH 49633)
Bug in DataFrame.query() which raised an exception when querying integer column names using backticks. (GH 60494)
Bug in DataFrame.sample() with replace=False and (n * max(weights) / sum(weights)) > 1, the method would return biased results. Now raises ValueError. (GH 61516)
Bug in DataFrame.shift() where passing a freq on a DataFrame with no columns did not shift the index correctly. (GH 60102)
Bug in DataFrame.sort_index() when passing axis="columns" and ignore_index=True and ascending=False not returning a RangeIndex columns (GH 57293)
Bug in DataFrame.sort_values() where sorting by a column explicitly named None raised a KeyError instead of sorting by the column as expected. (GH 61512)
Bug in DataFrame.transform() that was returning the wrong order unless the index was monotonically increasing. (GH 57069)
Bug in DataFrame.where() where using a non-bool type array in the function would return a ValueError instead of a TypeError (GH 56330)
Bug in Index.sort_values() when passing a key function that turns values into tuples, e.g. key=natsort.natsort_key, would raise TypeError (GH 56081)
Bug in MultiIndex.fillna() error message was referring to isna instead of fillna (GH 60974)
Bug in Series.describe() where median percentile was always included when the percentiles argument was passed (GH 60550).
Bug in Series.diff() allowing non-integer values for the periods argument. (GH 56607)
Bug in Series.dt() methods in ArrowDtype that were returning incorrect values. (GH 57355)
Bug in Series.isin() raising TypeError when series is large (>10**6) and values contains NA (GH 60678)
Bug in Series.kurt() and Series.skew() resulting in zero for low variance arrays (GH 57972)
Bug in Series.map() with a timestamp[pyarrow] dtype or duration[pyarrow] dtype incorrectly returning all-NaN entries (GH 61231)
Bug in Series.mode() where an exception was raised when taking the mode with nullable types with no null values in the series. (GH 58926)
Bug in Series.rank() that doesn’t preserve missing values for nullable integers when na_option='keep'. (GH 56976)
Bug in Series.replace() and DataFrame.replace() throwing ValueError when regex=True and all NA values. (GH 60688)
Bug in Series.to_string() when series contains complex floats with exponents (GH 60405)
Bug in read_csv() where chained fsspec TAR file and compression="infer" fails with tarfile.ReadError (GH 60028)
Bug in Dataframe Interchange Protocol implementation was returning incorrect results for data buffers’ associated dtype, for string and datetime columns (GH 54781)
Bug in Series.list methods not preserving the original Index. (GH 58425)
Bug in Series.list methods not preserving the original name. (GH 60522)
Bug in Series.replace when the Series was created from an Index and Copy-On-Write is enabled (GH 61622)
Bug in divmod and rdivmod with DataFrame, Series, and Index with bool dtypes failing to raise, which was inconsistent with __floordiv__ behavior (GH 46043)
Bug in printing a DataFrame with a DataFrame stored in DataFrame.attrs raised a ValueError (GH 60455)
Bug in printing a Series with a DataFrame stored in Series.attrs raised a ValueError (GH 60568)
Deprecated the keyword check_datetimelike_compat in testing.assert_frame_equal() and testing.assert_series_equal() (GH 55638)
Fixed bug in Series.replace() and DataFrame.replace() when trying to replace NA values in a Float64Dtype object with np.nan; this now works with pd.set_option("mode.nan_is_na", False) and is irrelevant otherwise (GH 55127)
Fixed bug in Series.replace() and DataFrame.replace() when trying to replace np.nan values in a Int64Dtype object with NA; this is now a no-op with pd.set_option("mode.nan_is_na", False) and is irrelevant otherwise (GH 51237)
Fixed bug in the Series.rank() with object dtype and extremely small float values (GH 62036)
Fixed bug where the DataFrame constructor misclassified array-like objects with a .name attribute as Series or Index (GH 61443)
Fixed regression in DataFrame.from_records() not initializing subclasses properly (GH 57008)

What’s new in 3.0.0 (Month XX, 2025)#

Enhancements#

Dedicated string data type by default#

Copy-on-Write#

pd.col syntax can now be used in DataFrame.assign() and DataFrame.loc() #

New Deprecation Policy#

Other enhancements#

Notable bug fixes#

Improved behavior in groupby for observed=False#

notable_bug_fix2#

Backwards incompatible API changes#

Datetime resolution inference#

Changed behavior in DataFrame.value_counts() and DataFrameGroupBy.value_counts() when sort=False#

Changed behavior of pd.offsets.Day to always represent calendar-day#

Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes#

Increased minimum version for Python#

Increased minimum versions for dependencies#

pytz now an optional dependency#

Other API changes#

Deprecations#

Copy keyword#

Other Deprecations#

Removal of prior version deprecations/changes#

Enforced deprecation of aliases M, Q, Y, etc. in favour of ME, QE, YE, etc. for offsets#

Other Removals#

Performance improvements#

Bug fixes#

Categorical#

Datetimelike#

Timedelta#

Timezones#

Numeric#

Conversion#

Strings#

Interval#

Indexing#

Missing#

MultiIndex#

I/O#

Period#

Plotting#

Groupby/resample/rolling#

Reshaping#

Sparse#

ExtensionArray#

Styler#

Other#

Contributors#

This Page

`pd.col` syntax can now be used in `DataFrame.assign()` and `DataFrame.loc()` #

Improved behavior in groupby for `observed=False`#

Changed behavior in `DataFrame.value_counts()` and `DataFrameGroupBy.value_counts()` when `sort=False`#

Changed behavior of `pd.offsets.Day` to always represent calendar-day#

`pytz` now an optional dependency#

Enforced deprecation of aliases `M`, `Q`, `Y`, etc. in favour of `ME`, `QE`, `YE`, etc. for offsets#