Indexing and selecting data#

Xarray offers extremely flexible indexing routines that combine the best features of NumPy and pandas for data selection.

The most basic way to access elements of a DataArray object is to use Python’s [] syntax, such as array[i, j], where i and j are both integers. As xarray objects can store coordinates corresponding to each dimension of an array, label-based indexing similar to pandas.DataFrame.loc is also possible. In label-based indexing, the element position i is automatically looked-up from the coordinate values.

Dimensions of xarray objects have names, so you can also lookup the dimensions by name, instead of remembering their positional order.

Quick overview#

In total, xarray supports four different kinds of indexing, as described below and summarized in this table:

Dimension lookup	Index lookup	`DataArray` syntax	`Dataset` syntax
Positional	By integer	`da[:, 0]`	not available
Positional	By label	`da.loc[:, 'IA']`	not available
By name	By integer	`da.isel(space=0)` or `da[dict(space=0)]`	`ds.isel(space=0)` or `ds[dict(space=0)]`
By name	By label	`da.sel(space='IA')` or `da.loc[dict(space='IA')]`	`ds.sel(space='IA')` or `ds.loc[dict(space='IA')]`

More advanced indexing is also possible for all the methods by supplying DataArray objects as indexer. See Vectorized Indexing for the details.

Masking with `where`#

Indexing methods on xarray objects generally return a subset of the original data. However, it is sometimes useful to select an object with the same shape as the original data, but with some elements masked. To do this type of selection in xarray, use where():

da = xr.DataArray(np.arange(16).reshape(4, 4), dims=["x", "y"])
da.where(da.x + da.y < 4)

<xarray.DataArray (x: 4, y: 4)> Size: 128B
array([[ 0., 1., 2., 3.],
 [ 4., 5., 6., nan],
 [ 8., 9., nan, nan],
 [12., nan, nan, nan]])
Dimensions without coordinates: x, y

xarray.DataArray

x: 4
y: 4

0.0 1.0 2.0 3.0 4.0 5.0 6.0 nan 8.0 9.0 nan nan 12.0 nan nan nan

array([[ 0., 1., 2., 3.],
 [ 4., 5., 6., nan],
 [ 8., 9., nan, nan],
 [12., nan, nan, nan]])

This is particularly useful for ragged indexing of multi-dimensional data, e.g., to apply a 2D mask to an image. Note that where follows all the usual xarray broadcasting and alignment rules for binary operations (e.g., +) between the object being indexed and the condition, as described in Computation:

da.where(da.y < 2)

<xarray.DataArray (x: 4, y: 4)> Size: 128B
array([[ 0., 1., nan, nan],
 [ 4., 5., nan, nan],
 [ 8., 9., nan, nan],
 [12., 13., nan, nan]])
Dimensions without coordinates: x, y

xarray.DataArray

x: 4
y: 4

0.0 1.0 nan nan 4.0 5.0 nan nan 8.0 9.0 nan nan 12.0 13.0 nan nan

array([[ 0., 1., nan, nan],
 [ 4., 5., nan, nan],
 [ 8., 9., nan, nan],
 [12., 13., nan, nan]])

By default where maintains the original size of the data. For cases where the selected data size is much smaller than the original data, use of the option drop=True clips coordinate elements that are fully masked:

da.where(da.y < 2, drop=True)

<xarray.DataArray (x: 4, y: 2)> Size: 64B
array([[ 0., 1.],
 [ 4., 5.],
 [ 8., 9.],
 [12., 13.]])
Dimensions without coordinates: x, y

xarray.DataArray

x: 4
y: 2

0.0 1.0 4.0 5.0 8.0 9.0 12.0 13.0

array([[ 0., 1.],
 [ 4., 5.],
 [ 8., 9.],
 [12., 13.]])

Selecting values with `isin`#

To check whether elements of an xarray object contain a single object, you can compare with the equality operator == (e.g., arr == 3). To check multiple values, use isin():

da = xr.DataArray([1, 2, 3, 4, 5], dims=["x"])
da.isin([2, 4])

<xarray.DataArray (x: 5)> Size: 5B
array([False, True, False, True, False])
Dimensions without coordinates: x

xarray.DataArray

x: 5

False True False True False

array([False, True, False, True, False])

isin() works particularly well with where() to support indexing by arrays that are not already labels of an array:

lookup = xr.DataArray([-1, -2, -3, -4, -5], dims=["x"])
da.where(lookup.isin([-2, -4]), drop=True)

<xarray.DataArray (x: 2)> Size: 16B
array([2., 4.])
Dimensions without coordinates: x

xarray.DataArray

x: 2

2.0 4.0
```
array([2., 4.])
```

However, some caution is in order: when done repeatedly, this type of indexing is significantly slower than using sel().

Missing coordinate labels#

Coordinate labels for each dimension are optional (as of xarray v0.9). Label based indexing with .sel and .loc uses standard positional, integer-based indexing as a fallback for dimensions without a coordinate label:

da = xr.DataArray([1, 2, 3], dims="x")
da.sel(x=[0, -1])

<xarray.DataArray (x: 2)> Size: 16B
array([1, 3])
Dimensions without coordinates: x

xarray.DataArray

x: 2

1 3
```
array([1, 3])
```

Alignment between xarray objects where one or both do not have coordinate labels succeeds only if all dimensions of the same name have the same length. Otherwise, it raises an informative error:

xr.align(da, da[:2])

AlignmentError: cannot reindex or align along dimension 'x' because of conflicting dimension sizes: {2, 3}

Underlying Indexes#

Xarray uses the pandas.Index internally to perform indexing operations. If you need to access the underlying indexes, they are available through the indexes attribute.

da = xr.DataArray(
 np.random.rand(4, 3),
 [
 ("time", pd.date_range("2000年01月01日", periods=4)),
 ("space", ["IA", "IL", "IN"]),
 ],
)
da

<xarray.DataArray (time: 4, space: 3)> Size: 96B
array([[0.17091717, 0.39465901, 0.64166617],
 [0.27459243, 0.46235433, 0.87137165],
 [0.40113122, 0.61058827, 0.11796713],
 [0.70218436, 0.41403366, 0.34234521]])
Coordinates:
 * time (time) datetime64[us] 32B 2000年01月01日 2000年01月02日 ... 2000年01月04日
 * space (space) <U2 24B 'IA' 'IL' 'IN'

xarray.DataArray

time: 4
space: 3

0.1709 0.3947 0.6417 0.2746 0.4624 ... 0.118 0.7022 0.414 0.3423

array([[0.17091717, 0.39465901, 0.64166617],
 [0.27459243, 0.46235433, 0.87137165],
 [0.40113122, 0.61058827, 0.11796713],
 [0.70218436, 0.41403366, 0.34234521]])

Coordinates: (2)

time

(time)

datetime64[us]

2000年01月01日 ... 2000年01月04日

array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000',
 '2000年01月03日T00:00:00.000000', '2000年01月04日T00:00:00.000000'],
 dtype='datetime64[us]')

space
(space)
<U2
'IA' 'IL' 'IN'
```
array(['IA', 'IL', 'IN'], dtype='<U2')
```

da.indexes

Indexes:
 time DatetimeIndex(['2000年01月01日', '2000年01月02日', '2000年01月03日', '2000年01月04日'], dtype='datetime64[us]', name='time', freq='D')
 space Index(['IA', 'IL', 'IN'], dtype='str', name='space')

da.indexes["time"]

DatetimeIndex(['2000年01月01日', '2000年01月02日', '2000年01月03日', '2000年01月04日'], dtype='datetime64[us]', name='time', freq='D')

Use get_index() to get an index for a dimension, falling back to a default pandas.RangeIndex if it has no coordinate labels:

da = xr.DataArray([1, 2, 3], dims="x")
da

<xarray.DataArray (x: 3)> Size: 24B
array([1, 2, 3])
Dimensions without coordinates: x

xarray.DataArray

x: 3

1 2 3
```
array([1, 2, 3])
```

da.get_index("x")

RangeIndex(start=0, stop=3, step=1, name='x')

Copies vs. Views#

Whether array indexing returns a view or a copy of the underlying data depends on the nature of the labels.

For positional (integer) indexing, xarray follows the same rules as NumPy:

Positional indexing with only integers and slices returns a view.
Positional indexing with arrays or lists returns a copy.

The rules for label based indexing are more complex:

Label-based indexing with only slices returns a view.
Label-based indexing with arrays returns a copy.
Label-based indexing with scalars returns a view or a copy, depending upon if the corresponding positional indexer can be represented as an integer or a slice object. The exact rules are determined by pandas.

Whether data is a copy or a view is more predictable in xarray than in pandas, so unlike pandas, xarray does not produce SettingWithCopy warnings. However, you should still avoid assignment with chained indexing.

Note that other operations (such as values()) may also return views rather than copies.

Indexing rules#

Here we describe the full rules xarray uses for vectorized indexing. Note that this is for the purposes of explanation: for the sake of efficiency and to support various backends, the actual implementation is different.

(Only for label based indexing.) Look up positional indexes along each dimension from the corresponding pandas.Index.
A full slice object : is inserted for each dimension without an indexer.
slice objects are converted into arrays, given by np.arange(*slice.indices(...)).
Assume dimension names for array indexers without dimensions, such as np.ndarray and list, from the dimensions to be indexed along. For example, v.isel(x=[0, 1]) is understood as v.isel(x=xr.DataArray([0, 1], dims=['x'])).
For each variable in a Dataset or DataArray (the array and its coordinates):
1. Broadcast all relevant indexers based on their dimension names (see Broadcasting by dimension name for full details).
2. Index the underling array by the broadcast indexers, using NumPy’s advanced indexing rules.
If any indexer DataArray has coordinates and no coordinate with the same name exists, attach them to the indexed object.

Note

Only 1-dimensional boolean arrays can be used as indexers.

Indexing and selecting data#

Quick overview#

Positional indexing#

Indexing with dimension names#

Nearest neighbor lookups#

Dataset indexing#

Dropping labels and dimensions#

Masking with `where`#

Selecting values with `isin`#

Vectorized Indexing#

Assigning values with indexing#

More advanced indexing#

Align and reindex#

Missing coordinate labels#

Underlying Indexes#

Copies vs. Views#

Multi-level indexing#

Indexing rules#

Indexing and selecting data#

Quick overview#

Positional indexing#

Indexing with dimension names#

Nearest neighbor lookups#

Dataset indexing#

Dropping labels and dimensions#

Masking with where#

Selecting values with isin#

Vectorized Indexing#

Assigning values with indexing#

More advanced indexing#

Align and reindex#

Missing coordinate labels#

Underlying Indexes#

Copies vs. Views#

Multi-level indexing#

Indexing rules#

Masking with `where`#

Selecting values with `isin`#