pandas.DataFrame.reindex#

DataFrame.reindex(labels=None, *, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=nan, limit=None, tolerance=None)[source] #

Conform DataFrame to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:
labelsarray-like, optional

New labels / index to conform the axis specified by ‘axis’ to.

indexarray-like, optional

New labels for the index. Preferably an Index object to avoid duplicating data.

columnsarray-like, optional

New labels for the columns. Preferably an Index object to avoid duplicating data.

axisint or str, optional

Axis to target. Can be either the axis name (‘index’, ‘columns’) or number (0, 1).

method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}

Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

  • None (default): don’t fill gaps

  • pad / ffill: Propagate last valid observation forward to next valid.

  • backfill / bfill: Use next valid observation to fill gap.

  • nearest: Use nearest valid observations to fill gap.

copybool, default True

Return a new object, even if the passed indexes are the same.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

levelint or name

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuescalar, default np.nan

Value to use for missing values. Defaults to NaN, but can be any "compatible" value.

limitint, default None

Maximum number of consecutive elements to forward or backward fill.

toleranceoptional

Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Returns:
DataFrame with changed index.

See also

DataFrame.set_index

Set row labels.

DataFrame.reset_index

Remove row labels or move them to new columns.

DataFrame.reindex_like

Change to same indices as other DataFrame.

Examples

DataFrame.reindex supports two calling conventions

  • (index=index_labels, columns=column_labels, ...)

  • (labels, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Create a dataframe with some fictional data.

>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],
...  'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
...  index=index)
>>> df
 http_status response_time
Firefox 200 0.04
Chrome 200 0.02
Safari 404 0.07
IE10 404 0.08
Konqueror 301 1.00

Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the dataframe are assigned NaN.

>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
...  'Chrome']
>>> df.reindex(new_index)
 http_status response_time
Safari 404.0 0.07
Iceweasel NaN NaN
Comodo Dragon NaN NaN
IE10 404.0 0.08
Chrome 200.0 0.02

We can fill in the missing values by passing a value to the keyword fill_value. Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.

>>> df.reindex(new_index, fill_value=0)
 http_status response_time
Safari 404 0.07
Iceweasel 0 0.00
Comodo Dragon 0 0.00
IE10 404 0.08
Chrome 200 0.02
>>> df.reindex(new_index, fill_value='missing')
 http_status response_time
Safari 404 0.07
Iceweasel missing missing
Comodo Dragon missing missing
IE10 404 0.08
Chrome 200 0.02

We can also reindex the columns.

>>> df.reindex(columns=['http_status', 'user_agent'])
 http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10 404 NaN
Konqueror 301 NaN

Or we can use "axis-style" keyword arguments

>>> df.reindex(['http_status', 'user_agent'], axis="columns")
 http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10 404 NaN
Konqueror 301 NaN

To further illustrate the filling functionality in reindex, we will create a dataframe with a monotonically increasing index (for example, a sequence of dates).

>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
...  index=date_index)
>>> df2
 prices
2010年01月01日 100.0
2010年01月02日 101.0
2010年01月03日 NaN
2010年01月04日 100.0
2010年01月05日 89.0
2010年01月06日 88.0

Suppose we decide to expand the dataframe to cover a wider date range.

>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
 prices
2009年12月29日 NaN
2009年12月30日 NaN
2009年12月31日 NaN
2010年01月01日 100.0
2010年01月02日 101.0
2010年01月03日 NaN
2010年01月04日 100.0
2010年01月05日 89.0
2010年01月06日 88.0
2010年01月07日 NaN

The index entries that did not have a value in the original data frame (for example, ‘2009年12月29日’) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.

For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.

>>> df2.reindex(date_index2, method='bfill')
 prices
2009年12月29日 100.0
2009年12月30日 100.0
2009年12月31日 100.0
2010年01月01日 100.0
2010年01月02日 101.0
2010年01月03日 NaN
2010年01月04日 100.0
2010年01月05日 89.0
2010年01月06日 88.0
2010年01月07日 NaN

Please note that the NaN value present in the original dataframe (at index value 2010年01月03日) will not be filled by any of the value propagation schemes. This is because filling while reindexing does not look at dataframe values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataframe, use the fillna() method.

See the user guide for more.