Indexing and selecting data#
Xarray offers extremely flexible indexing routines that combine the best features of NumPy and pandas for data selection.
The most basic way to access elements of a DataArray
object is to use Python’s [] syntax, such as array[i, j], where
i and j are both integers.
As xarray objects can store coordinates corresponding to each dimension of an
array, label-based indexing similar to pandas.DataFrame.loc is also possible.
In label-based indexing, the element position i is automatically
looked-up from the coordinate values.
Dimensions of xarray objects have names, so you can also lookup the dimensions by name, instead of remembering their positional order.
Quick overview#
In total, xarray supports four different kinds of indexing, as described below and summarized in this table:
Dimension lookup |
Index lookup |
|
|
|---|---|---|---|
Positional |
By integer |
|
not available |
Positional |
By label |
|
not available |
By name |
By integer |
|
|
By name |
By label |
|
|
More advanced indexing is also possible for all the methods by
supplying DataArray objects as indexer.
See Vectorized Indexing for the details.
Positional indexing#
Indexing a DataArray directly works (mostly) just like it
does for numpy arrays, except that the returned object is always another
DataArray:
da = xr.DataArray( np.random.rand(4, 3), [ ("time", pd.date_range("2000年01月01日", periods=4)), ("space", ["IA", "IL", "IN"]), ], ) da[:2]
<xarray.DataArray (time: 2, space: 3)> Size: 48B array([[0.12696983, 0.96671784, 0.26047601], [0.89723652, 0.37674972, 0.33622174]]) Coordinates: * time (time) datetime64[us] 16B 2000年01月01日 2000年01月02日 * space (space) <U2 24B 'IA' 'IL' 'IN'
- time: 2
- space: 3
- 0.127 0.9667 0.2605 0.8972 0.3767 0.3362
array([[0.12696983, 0.96671784, 0.26047601], [0.89723652, 0.37674972, 0.33622174]])
- time(time)datetime64[us]2000年01月01日 2000年01月02日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA' 'IL' 'IN'
array(['IA', 'IL', 'IN'], dtype='<U2')
da[0, 0]
<xarray.DataArray ()> Size: 8B array(0.12696983) Coordinates: time datetime64[us] 8B 2000年01月01日 space <U2 8B 'IA'
- 0.127
array(0.12696983)
- time()datetime64[us]2000年01月01日
array('2000年01月01日T00:00:00.000000', dtype='datetime64[us]') - space()<U2'IA'
array('IA', dtype='<U2')
da[:, [2, 1]]
<xarray.DataArray (time: 4, space: 2)> Size: 64B array([[0.26047601, 0.96671784], [0.33622174, 0.37674972], [0.12310214, 0.84025508], [0.44799682, 0.37301223]]) Coordinates: * time (time) datetime64[us] 32B 2000年01月01日 2000年01月02日 ... 2000年01月04日 * space (space) <U2 16B 'IN' 'IL'
- time: 4
- space: 2
- 0.2605 0.9667 0.3362 0.3767 0.1231 0.8403 0.448 0.373
array([[0.26047601, 0.96671784], [0.33622174, 0.37674972], [0.12310214, 0.84025508], [0.44799682, 0.37301223]])
- time(time)datetime64[us]2000年01月01日 ... 2000年01月04日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000', '2000年01月03日T00:00:00.000000', '2000年01月04日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IN' 'IL'
array(['IN', 'IL'], dtype='<U2')
Attributes are persisted in all indexing operations.
Warning
Positional indexing deviates from the NumPy when indexing with multiple
arrays like da[[0, 1], [0, 1]], as described in
Vectorized Indexing.
Xarray also supports label-based indexing, just like pandas. Because
we use a pandas.Index under the hood, label based indexing is very
fast. To do label based indexing, use the loc attribute:
da.loc["2000年01月01日":"2000年01月02日", "IA"]
<xarray.DataArray (time: 2)> Size: 16B array([0.12696983, 0.89723652]) Coordinates: * time (time) datetime64[us] 16B 2000年01月01日 2000年01月02日 space <U2 8B 'IA'
- time: 2
- 0.127 0.8972
array([0.12696983, 0.89723652])
- time(time)datetime64[us]2000年01月01日 2000年01月02日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000'], dtype='datetime64[us]')
- space()<U2'IA'
array('IA', dtype='<U2')
In this example, the selected is a subpart of the array
in the range ‘2000年01月01日’:’2000年01月02日’ along the first coordinate time
and with ‘IA’ value from the second coordinate space.
You can perform any of the label indexing operations supported by pandas, including indexing with individual, slices and lists/arrays of labels, as well as indexing with boolean arrays. Like pandas, label based indexing in xarray is inclusive of both the start and stop bounds.
Setting values with label based indexing is also supported:
da.loc["2000年01月01日", ["IL", "IN"]] = -10 da
<xarray.DataArray (time: 4, space: 3)> Size: 96B array([[ 0.12696983, -10. , -10. ], [ 0.89723652, 0.37674972, 0.33622174], [ 0.45137647, 0.84025508, 0.12310214], [ 0.5430262 , 0.37301223, 0.44799682]]) Coordinates: * time (time) datetime64[us] 32B 2000年01月01日 2000年01月02日 ... 2000年01月04日 * space (space) <U2 24B 'IA' 'IL' 'IN'
- time: 4
- space: 3
- 0.127 -10.0 -10.0 0.8972 0.3767 ... 0.8403 0.1231 0.543 0.373 0.448
array([[ 0.12696983, -10. , -10. ], [ 0.89723652, 0.37674972, 0.33622174], [ 0.45137647, 0.84025508, 0.12310214], [ 0.5430262 , 0.37301223, 0.44799682]])
- time(time)datetime64[us]2000年01月01日 ... 2000年01月04日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000', '2000年01月03日T00:00:00.000000', '2000年01月04日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA' 'IL' 'IN'
array(['IA', 'IL', 'IN'], dtype='<U2')
Indexing with dimension names#
With the dimension names, we do not have to rely on dimension order and can use them explicitly to slice data. There are two ways to do this:
Use the
sel()andisel()convenience methods:# index by integer array indices da.isel(space=0, time=slice(None, 2))
<xarray.DataArray (time: 2)> Size: 16B array([0.12696983, 0.89723652]) Coordinates: * time (time) datetime64[us] 16B 2000年01月01日 2000年01月02日 space <U2 8B 'IA'
xarray.DataArray- time: 2
- 0.127 0.8972
array([0.12696983, 0.89723652])
- time(time)datetime64[us]2000年01月01日 2000年01月02日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000'], dtype='datetime64[us]')
- space()<U2'IA'
array('IA', dtype='<U2')
# index by dimension coordinate labels da.sel(time=slice("2000年01月01日", "2000年01月02日"))
<xarray.DataArray (time: 2, space: 3)> Size: 48B array([[ 0.12696983, -10. , -10. ], [ 0.89723652, 0.37674972, 0.33622174]]) Coordinates: * time (time) datetime64[us] 16B 2000年01月01日 2000年01月02日 * space (space) <U2 24B 'IA' 'IL' 'IN'
xarray.DataArray- time: 2
- space: 3
- 0.127 -10.0 -10.0 0.8972 0.3767 0.3362
array([[ 0.12696983, -10. , -10. ], [ 0.89723652, 0.37674972, 0.33622174]])
- time(time)datetime64[us]2000年01月01日 2000年01月02日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA' 'IL' 'IN'
array(['IA', 'IL', 'IN'], dtype='<U2')
Use a dictionary as the argument for array positional or label based array indexing:
# index by integer array indices da[dict(space=0, time=slice(None, 2))]
<xarray.DataArray (time: 2)> Size: 16B array([0.12696983, 0.89723652]) Coordinates: * time (time) datetime64[us] 16B 2000年01月01日 2000年01月02日 space <U2 8B 'IA'
xarray.DataArray- time: 2
- 0.127 0.8972
array([0.12696983, 0.89723652])
- time(time)datetime64[us]2000年01月01日 2000年01月02日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000'], dtype='datetime64[us]')
- space()<U2'IA'
array('IA', dtype='<U2')
# index by dimension coordinate labels da.loc[dict(time=slice("2000年01月01日", "2000年01月02日"))]
<xarray.DataArray (time: 2, space: 3)> Size: 48B array([[ 0.12696983, -10. , -10. ], [ 0.89723652, 0.37674972, 0.33622174]]) Coordinates: * time (time) datetime64[us] 16B 2000年01月01日 2000年01月02日 * space (space) <U2 24B 'IA' 'IL' 'IN'
xarray.DataArray- time: 2
- space: 3
- 0.127 -10.0 -10.0 0.8972 0.3767 0.3362
array([[ 0.12696983, -10. , -10. ], [ 0.89723652, 0.37674972, 0.33622174]])
- time(time)datetime64[us]2000年01月01日 2000年01月02日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA' 'IL' 'IN'
array(['IA', 'IL', 'IN'], dtype='<U2')
The arguments to these methods can be any objects that could index the array
along the dimension given by the keyword, e.g., labels for an individual value,
Python slice objects or 1-dimensional arrays.
Note
We would love to be able to do indexing with labeled dimension names inside
brackets, but unfortunately, Python does not yet support indexing with
keyword arguments like da[space=0]
Nearest neighbor lookups#
The label based selection methods sel(),
reindex() and reindex_like() all
support method and tolerance keyword argument. The method parameter allows for
enabling nearest neighbor (inexact) lookups by use of the methods 'pad',
'backfill' or 'nearest':
da = xr.DataArray([1, 2, 3], [("x", [0, 1, 2])]) da.sel(x=[1.1, 1.9], method="nearest")
<xarray.DataArray (x: 2)> Size: 16B array([2, 3]) Coordinates: * x (x) int64 16B 1 2
- x: 2
- 2 3
array([2, 3])
- x(x)int641 2
array([1, 2])
da.sel(x=0.1, method="backfill")
<xarray.DataArray ()> Size: 8B array(2) Coordinates: x int64 8B 1
- 2
array(2)
- x()int641
array(1)
da.reindex(x=[0.5, 1, 1.5, 2, 2.5], method="pad")
<xarray.DataArray (x: 5)> Size: 40B array([1, 2, 2, 3, 3]) Coordinates: * x (x) float64 40B 0.5 1.0 1.5 2.0 2.5
- x: 5
- 1 2 2 3 3
array([1, 2, 2, 3, 3])
- x(x)float640.5 1.0 1.5 2.0 2.5
array([0.5, 1. , 1.5, 2. , 2.5])
Tolerance limits the maximum distance for valid matches with an inexact lookup:
da.reindex(x=[1.1, 1.5], method="nearest", tolerance=0.2)
<xarray.DataArray (x: 2)> Size: 16B array([ 2., nan]) Coordinates: * x (x) float64 16B 1.1 1.5
- x: 2
- 2.0 nan
array([ 2., nan])
- x(x)float641.1 1.5
array([1.1, 1.5])
The method parameter is not yet supported if any of the arguments
to .sel() is a slice object:
da.sel(x=slice(1, 3), method="nearest")
NotImplementedError: cannot use ``method`` argument if any indexers are slice objects
However, you don’t need to use method to do inexact slicing. Slicing
already returns all values inside the range (inclusive), as long as the index
labels are monotonic increasing:
da.sel(x=slice(0.9, 3.1))
<xarray.DataArray (x: 2)> Size: 16B array([2, 3]) Coordinates: * x (x) int64 16B 1 2
- x: 2
- 2 3
array([2, 3])
- x(x)int641 2
array([1, 2])
Indexing axes with monotonic decreasing labels also works, as long as the
slice or .loc arguments are also decreasing:
reversed_da = da[::-1] reversed_da.loc[3.1:0.9]
<xarray.DataArray (x: 2)> Size: 16B array([3, 2]) Coordinates: * x (x) int64 16B 2 1
- x: 2
- 3 2
array([3, 2])
- x(x)int642 1
array([2, 1])
Note
If you want to interpolate along coordinates rather than looking up the
nearest neighbors, use interp() and
interp_like().
See interpolation for the details.
Dataset indexing#
We can also use these methods to index all variables in a dataset simultaneously, returning a new dataset:
da = xr.DataArray( np.random.rand(4, 3), [ ("time", pd.date_range("2000年01月01日", periods=4)), ("space", ["IA", "IL", "IN"]), ], ) ds = da.to_dataset(name="foo") ds.isel(space=[0], time=[0])
<xarray.Dataset> Size: 24B Dimensions: (time: 1, space: 1) Coordinates: * time (time) datetime64[us] 8B 2000年01月01日 * space (space) <U2 8B 'IA' Data variables: foo (time, space) float64 8B 0.1294
- time: 1
- space: 1
- time(time)datetime64[us]2000年01月01日
array(['2000年01月01日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA'
array(['IA'], dtype='<U2')
- foo(time, space)float640.1294
array([[0.12944068]])
ds.sel(time="2000年01月01日")
<xarray.Dataset> Size: 56B Dimensions: (space: 3) Coordinates: * space (space) <U2 24B 'IA' 'IL' 'IN' time datetime64[us] 8B 2000年01月01日 Data variables: foo (space) float64 24B 0.1294 0.8599 0.8204
- space: 3
- space(space)<U2'IA' 'IL' 'IN'
array(['IA', 'IL', 'IN'], dtype='<U2')
- time()datetime64[us]2000年01月01日
array('2000年01月01日T00:00:00.000000', dtype='datetime64[us]')
- foo(space)float640.1294 0.8599 0.8204
array([0.12944068, 0.85987871, 0.82038836])
Positional indexing on a dataset is not supported because the ordering of dimensions in a dataset is somewhat ambiguous (it can vary between different arrays). However, you can do normal indexing with dimension names:
ds[dict(space=[0], time=[0])]
<xarray.Dataset> Size: 24B Dimensions: (time: 1, space: 1) Coordinates: * time (time) datetime64[us] 8B 2000年01月01日 * space (space) <U2 8B 'IA' Data variables: foo (time, space) float64 8B 0.1294
- time: 1
- space: 1
- time(time)datetime64[us]2000年01月01日
array(['2000年01月01日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA'
array(['IA'], dtype='<U2')
- foo(time, space)float640.1294
array([[0.12944068]])
ds.loc[dict(time="2000年01月01日")]
<xarray.Dataset> Size: 56B Dimensions: (space: 3) Coordinates: * space (space) <U2 24B 'IA' 'IL' 'IN' time datetime64[us] 8B 2000年01月01日 Data variables: foo (space) float64 24B 0.1294 0.8599 0.8204
- space: 3
- space(space)<U2'IA' 'IL' 'IN'
array(['IA', 'IL', 'IN'], dtype='<U2')
- time()datetime64[us]2000年01月01日
array('2000年01月01日T00:00:00.000000', dtype='datetime64[us]')
- foo(space)float640.1294 0.8599 0.8204
array([0.12944068, 0.85987871, 0.82038836])
Dropping labels and dimensions#
The drop_sel() method returns a new object with the listed
index labels along a dimension dropped:
ds.drop_sel(space=["IN", "IL"])
<xarray.Dataset> Size: 72B Dimensions: (time: 4, space: 1) Coordinates: * time (time) datetime64[us] 32B 2000年01月01日 2000年01月02日 ... 2000年01月04日 * space (space) <U2 8B 'IA' Data variables: foo (time, space) float64 32B 0.1294 0.3521 0.5948 0.2355
- time: 4
- space: 1
- time(time)datetime64[us]2000年01月01日 ... 2000年01月04日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000', '2000年01月03日T00:00:00.000000', '2000年01月04日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA'
array(['IA'], dtype='<U2')
- foo(time, space)float640.1294 0.3521 0.5948 0.2355
array([[0.12944068], [0.35205354], [0.59478359], [0.23550748]])
drop_sel is both a Dataset and DataArray method.
Use drop_dims() to drop a full dimension from a Dataset.
Any variables with these dimensions are also dropped:
ds.drop_dims("time")
<xarray.Dataset> Size: 24B Dimensions: (space: 3) Coordinates: * space (space) <U2 24B 'IA' 'IL' 'IN' Data variables: *empty*
- space: 3
- space(space)<U2'IA' 'IL' 'IN'
array(['IA', 'IL', 'IN'], dtype='<U2')
Masking with where#
Indexing methods on xarray objects generally return a subset of the original data.
However, it is sometimes useful to select an object with the same shape as the
original data, but with some elements masked. To do this type of selection in
xarray, use where():
da = xr.DataArray(np.arange(16).reshape(4, 4), dims=["x", "y"]) da.where(da.x + da.y < 4)
<xarray.DataArray (x: 4, y: 4)> Size: 128B array([[ 0., 1., 2., 3.], [ 4., 5., 6., nan], [ 8., 9., nan, nan], [12., nan, nan, nan]]) Dimensions without coordinates: x, y
- x: 4
- y: 4
- 0.0 1.0 2.0 3.0 4.0 5.0 6.0 nan 8.0 9.0 nan nan 12.0 nan nan nan
array([[ 0., 1., 2., 3.], [ 4., 5., 6., nan], [ 8., 9., nan, nan], [12., nan, nan, nan]])
This is particularly useful for ragged indexing of multi-dimensional data,
e.g., to apply a 2D mask to an image. Note that where follows all the
usual xarray broadcasting and alignment rules for binary operations (e.g.,
+) between the object being indexed and the condition, as described in
Computation:
da.where(da.y < 2)
<xarray.DataArray (x: 4, y: 4)> Size: 128B array([[ 0., 1., nan, nan], [ 4., 5., nan, nan], [ 8., 9., nan, nan], [12., 13., nan, nan]]) Dimensions without coordinates: x, y
- x: 4
- y: 4
- 0.0 1.0 nan nan 4.0 5.0 nan nan 8.0 9.0 nan nan 12.0 13.0 nan nan
array([[ 0., 1., nan, nan], [ 4., 5., nan, nan], [ 8., 9., nan, nan], [12., 13., nan, nan]])
By default where maintains the original size of the data. For cases
where the selected data size is much smaller than the original data,
use of the option drop=True clips coordinate
elements that are fully masked:
da.where(da.y < 2, drop=True)
<xarray.DataArray (x: 4, y: 2)> Size: 64B array([[ 0., 1.], [ 4., 5.], [ 8., 9.], [12., 13.]]) Dimensions without coordinates: x, y
- x: 4
- y: 2
- 0.0 1.0 4.0 5.0 8.0 9.0 12.0 13.0
array([[ 0., 1.], [ 4., 5.], [ 8., 9.], [12., 13.]])
Selecting values with isin#
To check whether elements of an xarray object contain a single object, you can
compare with the equality operator == (e.g., arr == 3). To check
multiple values, use isin():
da = xr.DataArray([1, 2, 3, 4, 5], dims=["x"]) da.isin([2, 4])
<xarray.DataArray (x: 5)> Size: 5B array([False, True, False, True, False]) Dimensions without coordinates: x
- x: 5
- False True False True False
array([False, True, False, True, False])
isin() works particularly well with
where() to support indexing by arrays that are not
already labels of an array:
lookup = xr.DataArray([-1, -2, -3, -4, -5], dims=["x"]) da.where(lookup.isin([-2, -4]), drop=True)
<xarray.DataArray (x: 2)> Size: 16B array([2., 4.]) Dimensions without coordinates: x
- x: 2
- 2.0 4.0
array([2., 4.])
However, some caution is in order: when done repeatedly, this type of indexing
is significantly slower than using sel().
Vectorized Indexing#
Like numpy and pandas, xarray supports indexing many array elements at once in a vectorized manner.
If you only provide integers, slices, or unlabeled arrays (array without
dimension names, such as np.ndarray, list, but not
DataArray() or Variable()) indexing can be
understood as orthogonally. Each indexer component selects independently along
the corresponding dimension, similar to how vector indexing works in Fortran or
MATLAB, or after using the numpy.ix_() helper:
da = xr.DataArray( np.arange(12).reshape((3, 4)), dims=["x", "y"], coords={"x": [0, 1, 2], "y": ["a", "b", "c", "d"]}, ) da
<xarray.DataArray (x: 3, y: 4)> Size: 96B array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) Coordinates: * x (x) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd'
- x: 3
- y: 4
- 0 1 2 3 4 5 6 7 8 9 10 11
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
- x(x)int640 1 2
array([0, 1, 2])
- y(y)<U1'a' 'b' 'c' 'd'
array(['a', 'b', 'c', 'd'], dtype='<U1')
da[[0, 2, 2], [1, 3]]
<xarray.DataArray (x: 3, y: 2)> Size: 48B array([[ 1, 3], [ 9, 11], [ 9, 11]]) Coordinates: * x (x) int64 24B 0 2 2 * y (y) <U1 8B 'b' 'd'
- x: 3
- y: 2
- 1 3 9 11 9 11
array([[ 1, 3], [ 9, 11], [ 9, 11]])
- x(x)int640 2 2
array([0, 2, 2])
- y(y)<U1'b' 'd'
array(['b', 'd'], dtype='<U1')
For more flexibility, you can supply DataArray() objects
as indexers.
Dimensions on resultant arrays are given by the ordered union of the indexers’
dimensions:
ind_x = xr.DataArray([0, 1], dims=["x"]) ind_y = xr.DataArray([0, 1], dims=["y"]) da[ind_x, ind_y] # orthogonal indexing
<xarray.DataArray (x: 2, y: 2)> Size: 32B array([[0, 1], [4, 5]]) Coordinates: * x (x) int64 16B 0 1 * y (y) <U1 8B 'a' 'b'
- x: 2
- y: 2
- 0 1 4 5
array([[0, 1], [4, 5]])
- x(x)int640 1
array([0, 1])
- y(y)<U1'a' 'b'
array(['a', 'b'], dtype='<U1')
Slices or sequences/arrays without named-dimensions are treated as if they have the same dimension which is indexed along:
# Because [0, 1] is used to index along dimension 'x', # it is assumed to have dimension 'x' da[[0, 1], ind_x]
<xarray.DataArray (x: 2)> Size: 16B array([0, 5]) Coordinates: * x (x) int64 16B 0 1 y (x) <U1 8B 'a' 'b'
- x: 2
- 0 5
array([0, 5])
- x(x)int640 1
array([0, 1])
- y(x)<U1'a' 'b'
array(['a', 'b'], dtype='<U1')
Furthermore, you can use multi-dimensional DataArray()
as indexers, where the resultant array dimension is also determined by
indexers’ dimension:
ind = xr.DataArray([[0, 1], [0, 1]], dims=["a", "b"]) da[ind]
<xarray.DataArray (a: 2, b: 2, y: 4)> Size: 128B array([[[0, 1, 2, 3], [4, 5, 6, 7]], [[0, 1, 2, 3], [4, 5, 6, 7]]]) Coordinates: x (a, b) int64 32B 0 1 0 1 * y (y) <U1 16B 'a' 'b' 'c' 'd' Dimensions without coordinates: a, b
- a: 2
- b: 2
- y: 4
- 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
array([[[0, 1, 2, 3], [4, 5, 6, 7]], [[0, 1, 2, 3], [4, 5, 6, 7]]])
- x(a, b)int640 1 0 1
array([[0, 1], [0, 1]])
- y(y)<U1'a' 'b' 'c' 'd'
array(['a', 'b', 'c', 'd'], dtype='<U1')
Similar to how NumPy’s advanced indexing works, vectorized indexing for xarray is based on our broadcasting rules. See Indexing rules for the complete specification.
Vectorized indexing also works with isel, loc, and sel:
ind = xr.DataArray([[0, 1], [0, 1]], dims=["a", "b"]) da.isel(y=ind) # same as da[:, ind]
<xarray.DataArray (x: 3, a: 2, b: 2)> Size: 96B array([[[0, 1], [0, 1]], [[4, 5], [4, 5]], [[8, 9], [8, 9]]]) Coordinates: * x (x) int64 24B 0 1 2 y (a, b) <U1 16B 'a' 'b' 'a' 'b' Dimensions without coordinates: a, b
- x: 3
- a: 2
- b: 2
- 0 1 0 1 4 5 4 5 8 9 8 9
array([[[0, 1], [0, 1]], [[4, 5], [4, 5]], [[8, 9], [8, 9]]])
- x(x)int640 1 2
array([0, 1, 2])
- y(a, b)<U1'a' 'b' 'a' 'b'
array([['a', 'b'], ['a', 'b']], dtype='<U1')
ind = xr.DataArray([["a", "b"], ["b", "a"]], dims=["a", "b"]) da.loc[:, ind] # same as da.sel(y=ind)
<xarray.DataArray (x: 3, a: 2, b: 2)> Size: 96B array([[[0, 1], [1, 0]], [[4, 5], [5, 4]], [[8, 9], [9, 8]]]) Coordinates: * x (x) int64 24B 0 1 2 y (a, b) <U1 16B 'a' 'b' 'b' 'a' Dimensions without coordinates: a, b
- x: 3
- a: 2
- b: 2
- 0 1 1 0 4 5 5 4 8 9 9 8
array([[[0, 1], [1, 0]], [[4, 5], [5, 4]], [[8, 9], [9, 8]]])
- x(x)int640 1 2
array([0, 1, 2])
- y(a, b)<U1'a' 'b' 'b' 'a'
array([['a', 'b'], ['b', 'a']], dtype='<U1')
These methods may also be applied to Dataset objects
ds = da.to_dataset(name="bar") ds.isel(x=xr.DataArray([0, 1, 2], dims=["points"]))
<xarray.Dataset> Size: 136B Dimensions: (points: 3, y: 4) Coordinates: x (points) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd' Dimensions without coordinates: points Data variables: bar (points, y) int64 96B 0 1 2 3 4 5 6 7 8 9 10 11
- points: 3
- y: 4
- x(points)int640 1 2
array([0, 1, 2])
- y(y)<U1'a' 'b' 'c' 'd'
array(['a', 'b', 'c', 'd'], dtype='<U1')
- bar(points, y)int640 1 2 3 4 5 6 7 8 9 10 11
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
Vectorized indexing may be used to extract information from the nearest grid cells of interest, for example, the nearest climate model grid cells to a collection specified weather station latitudes and longitudes. To trigger vectorized indexing behavior you will need to provide the selection dimensions with a new shared output dimension name. In the example below, the selections of the closest latitude and longitude are renamed to an output dimension named "points":
ds = xr.tutorial.open_dataset("air_temperature") # Define target latitude and longitude (where weather stations might be) target_lon = xr.DataArray([200, 201, 202, 205], dims="points") target_lat = xr.DataArray([31, 41, 42, 42], dims="points") # Retrieve data at the grid cells nearest to the target latitudes and longitudes da = ds["air"].sel(lon=target_lon, lat=target_lat, method="nearest") da
<xarray.DataArray 'air' (time: 2920, points: 4)> Size: 93kB [11680 values with dtype=float64] Coordinates: * time (time) datetime64[ns] 23kB 2013年01月01日 ... 2014年12月31日T18:00:00 lat (points) float32 16B 30.0 40.0 42.5 42.5 lon (points) float32 16B 200.0 200.0 202.5 205.0 Dimensions without coordinates: points Attributes: long_name: 4xDaily Air temperature at sigma level 995 units: degK precision: 2 GRIB_id: 11 GRIB_name: TMP var_desc: Air temperature dataset: NMC Reanalysis level_desc: Surface statistic: Individual Obs parent_stat: Other actual_range: [185.16 322.1 ]
- time: 2920
- points: 4
- ...
[11680 values with dtype=float64]
- time(time)datetime64[ns]2013年01月01日 ... 2014年12月31日T18:00:00
- standard_name :
- time
- long_name :
- Time
array(['2013年01月01日T00:00:00.000000000', '2013年01月01日T06:00:00.000000000', '2013年01月01日T12:00:00.000000000', ..., '2014年12月31日T06:00:00.000000000', '2014年12月31日T12:00:00.000000000', '2014年12月31日T18:00:00.000000000'], shape=(2920,), dtype='datetime64[ns]')
- lat(points)float3230.0 40.0 42.5 42.5
- standard_name :
- latitude
- long_name :
- Latitude
- units :
- degrees_north
- axis :
- Y
array([30. , 40. , 42.5, 42.5], dtype=float32)
- lon(points)float32200.0 200.0 202.5 205.0
- standard_name :
- longitude
- long_name :
- Longitude
- units :
- degrees_east
- axis :
- X
array([200. , 200. , 202.5, 205. ], dtype=float32)
- long_name :
- 4xDaily Air temperature at sigma level 995
- units :
- degK
- precision :
- 2
- GRIB_id :
- 11
- GRIB_name :
- TMP
- var_desc :
- Air temperature
- dataset :
- NMC Reanalysis
- level_desc :
- Surface
- statistic :
- Individual Obs
- parent_stat :
- Other
- actual_range :
- [185.16 322.1 ]
Tip
If you are lazily loading your data from disk, not every form of vectorized
indexing is supported (or if supported, may not be supported efficiently).
You may find increased performance by loading your data into memory first,
e.g., with load().
Note
If an indexer is a DataArray(), its coordinates should not
conflict with the selected subpart of the target array (except for the
explicitly indexed dimensions with .loc/.sel).
Otherwise, IndexError will be raised.
Assigning values with indexing#
To select and assign values to a portion of a DataArray() you
can use indexing with .loc :
ds = xr.tutorial.open_dataset("air_temperature") # add an empty 2D dataarray ds["empty"] = xr.full_like(ds.air.mean("time"), fill_value=0) # modify one grid point using loc() ds["empty"].loc[dict(lon=260, lat=30)] = 100 # modify a 2D region using loc() lc = ds.coords["lon"] la = ds.coords["lat"] ds["empty"].loc[ dict(lon=lc[(lc > 220) & (lc < 260)], lat=la[(la > 20) & (la < 60)]) ] = 100
or where():
# modify one grid point using xr.where() ds["empty"] = xr.where( (ds.coords["lat"] == 20) & (ds.coords["lon"] == 260), 100, ds["empty"] ) # or modify a 2D region using xr.where() mask = ( (ds.coords["lat"] > 20) & (ds.coords["lat"] < 60) & (ds.coords["lon"] > 220) & (ds.coords["lon"] < 260) ) ds["empty"] = xr.where(mask, 100, ds["empty"])
Vectorized indexing can also be used to assign values to xarray object.
da = xr.DataArray( np.arange(12).reshape((3, 4)), dims=["x", "y"], coords={"x": [0, 1, 2], "y": ["a", "b", "c", "d"]}, ) da
<xarray.DataArray (x: 3, y: 4)> Size: 96B array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) Coordinates: * x (x) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd'
- x: 3
- y: 4
- 0 1 2 3 4 5 6 7 8 9 10 11
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
- x(x)int640 1 2
array([0, 1, 2])
- y(y)<U1'a' 'b' 'c' 'd'
array(['a', 'b', 'c', 'd'], dtype='<U1')
da[0] = -1 # assignment with broadcasting da
<xarray.DataArray (x: 3, y: 4)> Size: 96B array([[-1, -1, -1, -1], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) Coordinates: * x (x) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd'
- x: 3
- y: 4
- -1 -1 -1 -1 4 5 6 7 8 9 10 11
array([[-1, -1, -1, -1], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
- x(x)int640 1 2
array([0, 1, 2])
- y(y)<U1'a' 'b' 'c' 'd'
array(['a', 'b', 'c', 'd'], dtype='<U1')
ind_x = xr.DataArray([0, 1], dims=["x"]) ind_y = xr.DataArray([0, 1], dims=["y"]) da[ind_x, ind_y] = -2 # assign -2 to (ix, iy) = (0, 0) and (1, 1) da
<xarray.DataArray (x: 3, y: 4)> Size: 96B array([[-2, -2, -1, -1], [-2, -2, 6, 7], [ 8, 9, 10, 11]]) Coordinates: * x (x) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd'
- x: 3
- y: 4
- -2 -2 -1 -1 -2 -2 6 7 8 9 10 11
array([[-2, -2, -1, -1], [-2, -2, 6, 7], [ 8, 9, 10, 11]])
- x(x)int640 1 2
array([0, 1, 2])
- y(y)<U1'a' 'b' 'c' 'd'
array(['a', 'b', 'c', 'd'], dtype='<U1')
da[ind_x, ind_y] += 100 # increment is also possible da
<xarray.DataArray (x: 3, y: 4)> Size: 96B array([[98, 98, -1, -1], [98, 98, 6, 7], [ 8, 9, 10, 11]]) Coordinates: * x (x) int64 24B 0 1 2 * y (y) <U1 16B 'a' 'b' 'c' 'd'
- x: 3
- y: 4
- 98 98 -1 -1 98 98 6 7 8 9 10 11
array([[98, 98, -1, -1], [98, 98, 6, 7], [ 8, 9, 10, 11]])
- x(x)int640 1 2
array([0, 1, 2])
- y(y)<U1'a' 'b' 'c' 'd'
array(['a', 'b', 'c', 'd'], dtype='<U1')
Like numpy.ndarray, value assignment sometimes works differently from what one may expect.
da = xr.DataArray([0, 1, 2, 3], dims=["x"]) ind = xr.DataArray([0, 0, 0], dims=["x"]) da[ind] -= 1 da
<xarray.DataArray (x: 4)> Size: 32B array([-1, 1, 2, 3]) Dimensions without coordinates: x
- x: 4
- -1 1 2 3
array([-1, 1, 2, 3])
Where the 0th element will be subtracted 1 only once.
This is because v[0] = v[0] - 1 is called three times, rather than
v[0] = v[0] - 1 - 1 - 1.
See Assigning values to indexed arrays for the details.
Note
Dask array does not support value assignment (see Parallel Computing with Dask for the details).
Note
Coordinates in both the left- and right-hand-side arrays should not
conflict with each other.
Otherwise, IndexError will be raised.
Warning
Do not try to assign values when using any of the indexing methods isel
or sel:
# DO NOT do this da.isel(space=0) = 0
Instead, values can be assigned using dictionary-based indexing:
da[dict(space=0)] = 0
Assigning values with the chained indexing using .sel or .isel fails silently.
da = xr.DataArray([0, 1, 2, 3], dims=["x"]) # DO NOT do this da.isel(x=[0, 1, 2])[1] = -1 da
<xarray.DataArray (x: 4)> Size: 32B array([0, 1, 2, 3]) Dimensions without coordinates: x
- x: 4
- 0 1 2 3
array([0, 1, 2, 3])
You can also assign values to all variables of a Dataset at once:
ds_org = xr.tutorial.open_dataset("eraint_uvz").isel( latitude=slice(56, 59), longitude=slice(255, 258), level=0 ) # set all values to 0 ds = xr.zeros_like(ds_org) ds
/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/stable/xarray/conventions.py:205: SerializationWarning: variable 'z' has non-conforming '_FillValue' np.float64(nan) defined, dropping '_FillValue' entirely. var = coder.decode(var, name=name) /home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/stable/xarray/conventions.py:205: SerializationWarning: variable 'u' has non-conforming '_FillValue' np.float64(nan) defined, dropping '_FillValue' entirely. var = coder.decode(var, name=name) /home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/stable/xarray/conventions.py:205: SerializationWarning: variable 'v' has non-conforming '_FillValue' np.float64(nan) defined, dropping '_FillValue' entirely. var = coder.decode(var, name=name)
<xarray.Dataset> Size: 468B Dimensions: (month: 2, latitude: 3, longitude: 3) Coordinates: * month (month) int32 8B 1 7 * latitude (latitude) float32 12B 48.0 47.25 46.5 * longitude (longitude) float32 12B 11.25 12.0 12.75 level int32 4B 200 Data variables: z (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 0.0 u (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 0.0 v (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 0.0 Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...
- month: 2
- latitude: 3
- longitude: 3
- month(month)int321 7
array([1, 7], dtype=int32)
- latitude(latitude)float3248.0 47.25 46.5
- units :
- degrees_north
- long_name :
- latitude
array([48. , 47.25, 46.5 ], dtype=float32)
- longitude(longitude)float3211.25 12.0 12.75
- units :
- degrees_east
- long_name :
- longitude
array([11.25, 12. , 12.75], dtype=float32)
- level()int32200
- units :
- millibars
- long_name :
- pressure_level
array(200, dtype=int32)
- z(month, latitude, longitude)float640.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
- number_of_significant_digits :
- 5
- units :
- m**2 s**-2
- long_name :
- Geopotential
- standard_name :
- geopotential
array([[[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]])
- u(month, latitude, longitude)float640.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- U component of wind
- standard_name :
- eastward_wind
array([[[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]])
- v(month, latitude, longitude)float640.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- V component of wind
- standard_name :
- northward_wind
array([[[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]])
- Conventions :
- CF-1.0
- Info :
- Monthly ERA-Interim data. Downloaded and edited by fabien.maussion@uibk.ac.at
# by integer ds[dict(latitude=2, longitude=2)] = 1 ds["u"]
<xarray.DataArray 'u' (month: 2, latitude: 3, longitude: 3)> Size: 144B array([[[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]], [[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]]]) Coordinates: * month (month) int32 8B 1 7 * latitude (latitude) float32 12B 48.0 47.25 46.5 * longitude (longitude) float32 12B 11.25 12.0 12.75 level int32 4B 200 Attributes: number_of_significant_digits: 2 units: m s**-1 long_name: U component of wind standard_name: eastward_wind
- month: 2
- latitude: 3
- longitude: 3
- 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
array([[[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]], [[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]]])
- month(month)int321 7
array([1, 7], dtype=int32)
- latitude(latitude)float3248.0 47.25 46.5
- units :
- degrees_north
- long_name :
- latitude
array([48. , 47.25, 46.5 ], dtype=float32)
- longitude(longitude)float3211.25 12.0 12.75
- units :
- degrees_east
- long_name :
- longitude
array([11.25, 12. , 12.75], dtype=float32)
- level()int32200
- units :
- millibars
- long_name :
- pressure_level
array(200, dtype=int32)
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- U component of wind
- standard_name :
- eastward_wind
ds["v"]
<xarray.DataArray 'v' (month: 2, latitude: 3, longitude: 3)> Size: 144B array([[[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]], [[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]]]) Coordinates: * month (month) int32 8B 1 7 * latitude (latitude) float32 12B 48.0 47.25 46.5 * longitude (longitude) float32 12B 11.25 12.0 12.75 level int32 4B 200 Attributes: number_of_significant_digits: 2 units: m s**-1 long_name: V component of wind standard_name: northward_wind
- month: 2
- latitude: 3
- longitude: 3
- 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
array([[[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]], [[0., 0., 0.], [0., 0., 0.], [0., 0., 1.]]])
- month(month)int321 7
array([1, 7], dtype=int32)
- latitude(latitude)float3248.0 47.25 46.5
- units :
- degrees_north
- long_name :
- latitude
array([48. , 47.25, 46.5 ], dtype=float32)
- longitude(longitude)float3211.25 12.0 12.75
- units :
- degrees_east
- long_name :
- longitude
array([11.25, 12. , 12.75], dtype=float32)
- level()int32200
- units :
- millibars
- long_name :
- pressure_level
array(200, dtype=int32)
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- V component of wind
- standard_name :
- northward_wind
# by label ds.loc[dict(latitude=47.25, longitude=[11.25, 12])] = 100 ds["u"]
<xarray.DataArray 'u' (month: 2, latitude: 3, longitude: 3)> Size: 144B array([[[ 0., 0., 0.], [100., 100., 0.], [ 0., 0., 1.]], [[ 0., 0., 0.], [100., 100., 0.], [ 0., 0., 1.]]]) Coordinates: * month (month) int32 8B 1 7 * latitude (latitude) float32 12B 48.0 47.25 46.5 * longitude (longitude) float32 12B 11.25 12.0 12.75 level int32 4B 200 Attributes: number_of_significant_digits: 2 units: m s**-1 long_name: U component of wind standard_name: eastward_wind
- month: 2
- latitude: 3
- longitude: 3
- 0.0 0.0 0.0 100.0 100.0 0.0 0.0 ... 0.0 100.0 100.0 0.0 0.0 0.0 1.0
array([[[ 0., 0., 0.], [100., 100., 0.], [ 0., 0., 1.]], [[ 0., 0., 0.], [100., 100., 0.], [ 0., 0., 1.]]])
- month(month)int321 7
array([1, 7], dtype=int32)
- latitude(latitude)float3248.0 47.25 46.5
- units :
- degrees_north
- long_name :
- latitude
array([48. , 47.25, 46.5 ], dtype=float32)
- longitude(longitude)float3211.25 12.0 12.75
- units :
- degrees_east
- long_name :
- longitude
array([11.25, 12. , 12.75], dtype=float32)
- level()int32200
- units :
- millibars
- long_name :
- pressure_level
array(200, dtype=int32)
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- U component of wind
- standard_name :
- eastward_wind
# dataset as new values new_dat = ds_org.loc[dict(latitude=48, longitude=[11.25, 12])] new_dat
<xarray.Dataset> Size: 120B Dimensions: (month: 2, longitude: 2) Coordinates: * month (month) int32 8B 1 7 * longitude (longitude) float32 8B 11.25 12.0 latitude float32 4B 48.0 level int32 4B 200 Data variables: z (month, longitude) float64 32B 1.136e+05 1.136e+05 ... 1.187e+05 u (month, longitude) float64 32B 12.75 12.69 14.87 14.62 v (month, longitude) float64 32B -7.891 -7.781 -1.875 -1.984 Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...
- month: 2
- longitude: 2
- month(month)int321 7
array([1, 7], dtype=int32)
- longitude(longitude)float3211.25 12.0
- units :
- degrees_east
- long_name :
- longitude
array([11.25, 12. ], dtype=float32)
- latitude()float3248.0
- units :
- degrees_north
- long_name :
- latitude
array(48., dtype=float32)
- level()int32200
- units :
- millibars
- long_name :
- pressure_level
array(200, dtype=int32)
- z(month, longitude)float641.136e+05 1.136e+05 ... 1.187e+05
- number_of_significant_digits :
- 5
- units :
- m**2 s**-2
- long_name :
- Geopotential
- standard_name :
- geopotential
array([[113599.619781, 113559.944149], [118735.026552, 118729.85147 ]])
- u(month, longitude)float6412.75 12.69 14.87 14.62
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- U component of wind
- standard_name :
- eastward_wind
array([[12.749925, 12.687016], [14.874649, 14.624589]])
- v(month, longitude)float64-7.891 -7.781 -1.875 -1.984
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- V component of wind
- standard_name :
- northward_wind
array([[-7.890651, -7.78123 ], [-1.874897, -1.984318]])
- Conventions :
- CF-1.0
- Info :
- Monthly ERA-Interim data. Downloaded and edited by fabien.maussion@uibk.ac.at
ds.loc[dict(latitude=47.25, longitude=[11.25, 12])] = new_dat ds["u"]
<xarray.DataArray 'u' (month: 2, latitude: 3, longitude: 3)> Size: 144B array([[[ 0. , 0. , 0. ], [12.74992466, 12.68701646, 0. ], [ 0. , 0. , 1. ]], [[ 0. , 0. , 0. ], [14.87464903, 14.62458894, 0. ], [ 0. , 0. , 1. ]]]) Coordinates: * month (month) int32 8B 1 7 * latitude (latitude) float32 12B 48.0 47.25 46.5 * longitude (longitude) float32 12B 11.25 12.0 12.75 level int32 4B 200 Attributes: number_of_significant_digits: 2 units: m s**-1 long_name: U component of wind standard_name: eastward_wind
- month: 2
- latitude: 3
- longitude: 3
- 0.0 0.0 0.0 12.75 12.69 0.0 0.0 ... 0.0 14.87 14.62 0.0 0.0 0.0 1.0
array([[[ 0. , 0. , 0. ], [12.74992466, 12.68701646, 0. ], [ 0. , 0. , 1. ]], [[ 0. , 0. , 0. ], [14.87464903, 14.62458894, 0. ], [ 0. , 0. , 1. ]]])
- month(month)int321 7
array([1, 7], dtype=int32)
- latitude(latitude)float3248.0 47.25 46.5
- units :
- degrees_north
- long_name :
- latitude
array([48. , 47.25, 46.5 ], dtype=float32)
- longitude(longitude)float3211.25 12.0 12.75
- units :
- degrees_east
- long_name :
- longitude
array([11.25, 12. , 12.75], dtype=float32)
- level()int32200
- units :
- millibars
- long_name :
- pressure_level
array(200, dtype=int32)
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- U component of wind
- standard_name :
- eastward_wind
The dimensions can differ between the variables in the dataset, but all variables need to have at least the dimensions specified in the indexer dictionary.
The new values must be either a scalar, a DataArray or a Dataset itself that contains all variables that also appear in the dataset to be modified.
More advanced indexing#
The use of DataArray() objects as indexers enables very
flexible indexing. The following is an example of the pointwise indexing:
da = xr.DataArray(np.arange(56).reshape((7, 8)), dims=["x", "y"]) da
<xarray.DataArray (x: 7, y: 8)> Size: 448B array([[ 0, 1, 2, 3, 4, 5, 6, 7], [ 8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47], [48, 49, 50, 51, 52, 53, 54, 55]]) Dimensions without coordinates: x, y
- x: 7
- y: 8
- 0 1 2 3 4 5 6 7 8 9 10 11 12 ... 44 45 46 47 48 49 50 51 52 53 54 55
array([[ 0, 1, 2, 3, 4, 5, 6, 7], [ 8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47], [48, 49, 50, 51, 52, 53, 54, 55]])
da.isel(x=xr.DataArray([0, 1, 6], dims="z"), y=xr.DataArray([0, 1, 0], dims="z"))
<xarray.DataArray (z: 3)> Size: 24B array([ 0, 9, 48]) Dimensions without coordinates: z
- z: 3
- 0 9 48
array([ 0, 9, 48])
where three elements at (ix, iy) = ((0, 0), (1, 1), (6, 0)) are selected
and mapped along a new dimension z.
If you want to add a coordinate to the new dimension z,
you can supply a DataArray with a coordinate,
da.isel( x=xr.DataArray([0, 1, 6], dims="z", coords={"z": ["a", "b", "c"]}), y=xr.DataArray([0, 1, 0], dims="z"), )
<xarray.DataArray (z: 3)> Size: 24B array([ 0, 9, 48]) Coordinates: * z (z) <U1 12B 'a' 'b' 'c'
- z: 3
- 0 9 48
array([ 0, 9, 48])
- z(z)<U1'a' 'b' 'c'
array(['a', 'b', 'c'], dtype='<U1')
Analogously, label-based pointwise-indexing is also possible by the .sel
method:
da = xr.DataArray( np.random.rand(4, 3), [ ("time", pd.date_range("2000年01月01日", periods=4)), ("space", ["IA", "IL", "IN"]), ], ) times = xr.DataArray( pd.to_datetime(["2000年01月03日", "2000年01月02日", "2000年01月01日"]), dims="new_time" ) da.sel(space=xr.DataArray(["IA", "IL", "IN"], dims=["new_time"]), time=times)
<xarray.DataArray (new_time: 3)> Size: 24B array([0.9195404 , 0.34044494, 0.590426 ]) Coordinates: * new_time (new_time) datetime64[us] 24B 2000年01月03日 2000年01月02日 2000年01月01日 time (new_time) datetime64[us] 24B 2000年01月03日 2000年01月02日 2000年01月01日 space (new_time) <U2 24B 'IA' 'IL' 'IN'
- new_time: 3
- 0.9195 0.3404 0.5904
array([0.9195404 , 0.34044494, 0.590426 ])
- new_time(new_time)datetime64[us]2000年01月03日 2000年01月02日 2000年01月01日
array(['2000年01月03日T00:00:00.000000', '2000年01月02日T00:00:00.000000', '2000年01月01日T00:00:00.000000'], dtype='datetime64[us]')
- time(new_time)datetime64[us]2000年01月03日 2000年01月02日 2000年01月01日
array(['2000年01月03日T00:00:00.000000', '2000年01月02日T00:00:00.000000', '2000年01月01日T00:00:00.000000'], dtype='datetime64[us]')
- space(new_time)<U2'IA' 'IL' 'IN'
array(['IA', 'IL', 'IN'], dtype='<U2')
Align and reindex#
Xarray’s reindex, reindex_like and align impose a DataArray or
Dataset onto a new set of coordinates corresponding to dimensions. The
original values are subset to the index labels still found in the new labels,
and values corresponding to new labels not found in the original object are
in-filled with NaN.
Xarray operations that combine multiple objects generally automatically align their arguments to share the same indexes. However, manual alignment can be useful for greater control and for increased performance.
To reindex a particular dimension, use reindex():
da.reindex(space=["IA", "CA"])
<xarray.DataArray (time: 4, space: 2)> Size: 64B array([[0.57401177, nan], [0.24534982, nan], [0.9195404 , nan], [0.75356885, nan]]) Coordinates: * time (time) datetime64[us] 32B 2000年01月01日 2000年01月02日 ... 2000年01月04日 * space (space) <U2 16B 'IA' 'CA'
- time: 4
- space: 2
- 0.574 nan 0.2453 nan 0.9195 nan 0.7536 nan
array([[0.57401177, nan], [0.24534982, nan], [0.9195404 , nan], [0.75356885, nan]])
- time(time)datetime64[us]2000年01月01日 ... 2000年01月04日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000', '2000年01月03日T00:00:00.000000', '2000年01月04日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA' 'CA'
array(['IA', 'CA'], dtype='<U2')
The reindex_like() method is a useful shortcut.
To demonstrate, we will make a subset DataArray with new values:
foo = da.rename("foo") baz = (10 * da[:2, :2]).rename("baz") baz
<xarray.DataArray 'baz' (time: 2, space: 2)> Size: 32B array([[5.74011775, 0.61269962], [2.45349819, 3.40444937]]) Coordinates: * time (time) datetime64[us] 16B 2000年01月01日 2000年01月02日 * space (space) <U2 16B 'IA' 'IL'
- time: 2
- space: 2
- 5.74 0.6127 2.453 3.404
array([[5.74011775, 0.61269962], [2.45349819, 3.40444937]])
- time(time)datetime64[us]2000年01月01日 2000年01月02日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA' 'IL'
array(['IA', 'IL'], dtype='<U2')
Reindexing foo with baz selects out the first two values along each
dimension:
foo.reindex_like(baz)
<xarray.DataArray 'foo' (time: 2, space: 2)> Size: 32B array([[0.57401177, 0.06126996], [0.24534982, 0.34044494]]) Coordinates: * time (time) datetime64[us] 16B 2000年01月01日 2000年01月02日 * space (space) <U2 16B 'IA' 'IL'
- time: 2
- space: 2
- 0.574 0.06127 0.2453 0.3404
array([[0.57401177, 0.06126996], [0.24534982, 0.34044494]])
- time(time)datetime64[us]2000年01月01日 2000年01月02日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA' 'IL'
array(['IA', 'IL'], dtype='<U2')
The opposite operation asks us to reindex to a larger shape, so we fill in
the missing values with NaN:
baz.reindex_like(foo)
<xarray.DataArray 'baz' (time: 4, space: 3)> Size: 96B array([[5.74011775, 0.61269962, nan], [2.45349819, 3.40444937, nan], [ nan, nan, nan], [ nan, nan, nan]]) Coordinates: * time (time) datetime64[us] 32B 2000年01月01日 2000年01月02日 ... 2000年01月04日 * space (space) <U2 24B 'IA' 'IL' 'IN'
- time: 4
- space: 3
- 5.74 0.6127 nan 2.453 3.404 nan nan nan nan nan nan nan
array([[5.74011775, 0.61269962, nan], [2.45349819, 3.40444937, nan], [ nan, nan, nan], [ nan, nan, nan]])
- time(time)datetime64[us]2000年01月01日 ... 2000年01月04日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000', '2000年01月03日T00:00:00.000000', '2000年01月04日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA' 'IL' 'IN'
array(['IA', 'IL', 'IN'], dtype='<U2')
The align() function lets us perform more flexible database-like
'inner', 'outer', 'left' and 'right' joins:
xr.align(foo, baz, join="inner")
(<xarray.DataArray 'foo' (time: 2, space: 2)> Size: 32B array([[0.57401177, 0.06126996], [0.24534982, 0.34044494]]) Coordinates: * time (time) datetime64[us] 16B 2000年01月01日 2000年01月02日 * space (space) <U2 16B 'IA' 'IL', <xarray.DataArray 'baz' (time: 2, space: 2)> Size: 32B array([[5.74011775, 0.61269962], [2.45349819, 3.40444937]]) Coordinates: * time (time) datetime64[us] 16B 2000年01月01日 2000年01月02日 * space (space) <U2 16B 'IA' 'IL')
xr.align(foo, baz, join="outer")
(<xarray.DataArray 'foo' (time: 4, space: 3)> Size: 96B array([[0.57401177, 0.06126996, 0.590426 ], [0.24534982, 0.34044494, 0.98472874], [0.9195404 , 0.03777169, 0.86154929], [0.75356885, 0.40517876, 0.34352588]]) Coordinates: * time (time) datetime64[us] 32B 2000年01月01日 2000年01月02日 ... 2000年01月04日 * space (space) <U2 24B 'IA' 'IL' 'IN', <xarray.DataArray 'baz' (time: 4, space: 3)> Size: 96B array([[5.74011775, 0.61269962, nan], [2.45349819, 3.40444937, nan], [ nan, nan, nan], [ nan, nan, nan]]) Coordinates: * time (time) datetime64[us] 32B 2000年01月01日 2000年01月02日 ... 2000年01月04日 * space (space) <U2 24B 'IA' 'IL' 'IN')
Both reindex_like and align work interchangeably between
DataArray and Dataset objects, and with any number of matching dimension names:
ds
<xarray.Dataset> Size: 468B Dimensions: (month: 2, latitude: 3, longitude: 3) Coordinates: * month (month) int32 8B 1 7 * latitude (latitude) float32 12B 48.0 47.25 46.5 * longitude (longitude) float32 12B 11.25 12.0 12.75 level int32 4B 200 Data variables: z (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 u (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 v (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...
- month: 2
- latitude: 3
- longitude: 3
- month(month)int321 7
array([1, 7], dtype=int32)
- latitude(latitude)float3248.0 47.25 46.5
- units :
- degrees_north
- long_name :
- latitude
array([48. , 47.25, 46.5 ], dtype=float32)
- longitude(longitude)float3211.25 12.0 12.75
- units :
- degrees_east
- long_name :
- longitude
array([11.25, 12. , 12.75], dtype=float32)
- level()int32200
- units :
- millibars
- long_name :
- pressure_level
array(200, dtype=int32)
- z(month, latitude, longitude)float640.0 0.0 0.0 ... 0.0 0.0 1.0
- number_of_significant_digits :
- 5
- units :
- m**2 s**-2
- long_name :
- Geopotential
- standard_name :
- geopotential
array([[[0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [1.13599620e+05, 1.13559944e+05, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]], [[0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [1.18735027e+05, 1.18729851e+05, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]]])
- u(month, latitude, longitude)float640.0 0.0 0.0 12.75 ... 0.0 0.0 1.0
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- U component of wind
- standard_name :
- eastward_wind
array([[[ 0. , 0. , 0. ], [12.74992466, 12.68701646, 0. ], [ 0. , 0. , 1. ]], [[ 0. , 0. , 0. ], [14.87464903, 14.62458894, 0. ], [ 0. , 0. , 1. ]]])
- v(month, latitude, longitude)float640.0 0.0 0.0 -7.891 ... 0.0 0.0 1.0
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- V component of wind
- standard_name :
- northward_wind
array([[[ 0. , 0. , 0. ], [-7.89065075, -7.78122997, 0. ], [ 0. , 0. , 1. ]], [[ 0. , 0. , 0. ], [-1.874897 , -1.98431778, 0. ], [ 0. , 0. , 1. ]]])
- Conventions :
- CF-1.0
- Info :
- Monthly ERA-Interim data. Downloaded and edited by fabien.maussion@uibk.ac.at
ds.reindex_like(baz)
<xarray.Dataset> Size: 468B Dimensions: (month: 2, latitude: 3, longitude: 3) Coordinates: * month (month) int32 8B 1 7 * latitude (latitude) float32 12B 48.0 47.25 46.5 * longitude (longitude) float32 12B 11.25 12.0 12.75 level int32 4B 200 Data variables: z (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 u (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 v (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...
- month: 2
- latitude: 3
- longitude: 3
- month(month)int321 7
array([1, 7], dtype=int32)
- latitude(latitude)float3248.0 47.25 46.5
- units :
- degrees_north
- long_name :
- latitude
array([48. , 47.25, 46.5 ], dtype=float32)
- longitude(longitude)float3211.25 12.0 12.75
- units :
- degrees_east
- long_name :
- longitude
array([11.25, 12. , 12.75], dtype=float32)
- level()int32200
- units :
- millibars
- long_name :
- pressure_level
array(200, dtype=int32)
- z(month, latitude, longitude)float640.0 0.0 0.0 ... 0.0 0.0 1.0
- number_of_significant_digits :
- 5
- units :
- m**2 s**-2
- long_name :
- Geopotential
- standard_name :
- geopotential
array([[[0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [1.13599620e+05, 1.13559944e+05, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]], [[0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [1.18735027e+05, 1.18729851e+05, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]]])
- u(month, latitude, longitude)float640.0 0.0 0.0 12.75 ... 0.0 0.0 1.0
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- U component of wind
- standard_name :
- eastward_wind
array([[[ 0. , 0. , 0. ], [12.74992466, 12.68701646, 0. ], [ 0. , 0. , 1. ]], [[ 0. , 0. , 0. ], [14.87464903, 14.62458894, 0. ], [ 0. , 0. , 1. ]]])
- v(month, latitude, longitude)float640.0 0.0 0.0 -7.891 ... 0.0 0.0 1.0
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- V component of wind
- standard_name :
- northward_wind
array([[[ 0. , 0. , 0. ], [-7.89065075, -7.78122997, 0. ], [ 0. , 0. , 1. ]], [[ 0. , 0. , 0. ], [-1.874897 , -1.98431778, 0. ], [ 0. , 0. , 1. ]]])
- Conventions :
- CF-1.0
- Info :
- Monthly ERA-Interim data. Downloaded and edited by fabien.maussion@uibk.ac.at
other = xr.DataArray(["a", "b", "c"], dims="other") # this is a no-op, because there are no shared dimension names ds.reindex_like(other)
<xarray.Dataset> Size: 468B Dimensions: (month: 2, latitude: 3, longitude: 3) Coordinates: * month (month) int32 8B 1 7 * latitude (latitude) float32 12B 48.0 47.25 46.5 * longitude (longitude) float32 12B 11.25 12.0 12.75 level int32 4B 200 Data variables: z (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 u (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 v (month, latitude, longitude) float64 144B 0.0 0.0 0.0 ... 0.0 1.0 Attributes: Conventions: CF-1.0 Info: Monthly ERA-Interim data. Downloaded and edited by fabien.m...
- month: 2
- latitude: 3
- longitude: 3
- month(month)int321 7
array([1, 7], dtype=int32)
- latitude(latitude)float3248.0 47.25 46.5
- units :
- degrees_north
- long_name :
- latitude
array([48. , 47.25, 46.5 ], dtype=float32)
- longitude(longitude)float3211.25 12.0 12.75
- units :
- degrees_east
- long_name :
- longitude
array([11.25, 12. , 12.75], dtype=float32)
- level()int32200
- units :
- millibars
- long_name :
- pressure_level
array(200, dtype=int32)
- z(month, latitude, longitude)float640.0 0.0 0.0 ... 0.0 0.0 1.0
- number_of_significant_digits :
- 5
- units :
- m**2 s**-2
- long_name :
- Geopotential
- standard_name :
- geopotential
array([[[0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [1.13599620e+05, 1.13559944e+05, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]], [[0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [1.18735027e+05, 1.18729851e+05, 0.00000000e+00], [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]]])
- u(month, latitude, longitude)float640.0 0.0 0.0 12.75 ... 0.0 0.0 1.0
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- U component of wind
- standard_name :
- eastward_wind
array([[[ 0. , 0. , 0. ], [12.74992466, 12.68701646, 0. ], [ 0. , 0. , 1. ]], [[ 0. , 0. , 0. ], [14.87464903, 14.62458894, 0. ], [ 0. , 0. , 1. ]]])
- v(month, latitude, longitude)float640.0 0.0 0.0 -7.891 ... 0.0 0.0 1.0
- number_of_significant_digits :
- 2
- units :
- m s**-1
- long_name :
- V component of wind
- standard_name :
- northward_wind
array([[[ 0. , 0. , 0. ], [-7.89065075, -7.78122997, 0. ], [ 0. , 0. , 1. ]], [[ 0. , 0. , 0. ], [-1.874897 , -1.98431778, 0. ], [ 0. , 0. , 1. ]]])
- Conventions :
- CF-1.0
- Info :
- Monthly ERA-Interim data. Downloaded and edited by fabien.maussion@uibk.ac.at
Missing coordinate labels#
Coordinate labels for each dimension are optional (as of xarray v0.9). Label
based indexing with .sel and .loc uses standard positional,
integer-based indexing as a fallback for dimensions without a coordinate label:
da = xr.DataArray([1, 2, 3], dims="x") da.sel(x=[0, -1])
<xarray.DataArray (x: 2)> Size: 16B array([1, 3]) Dimensions without coordinates: x
- x: 2
- 1 3
array([1, 3])
Alignment between xarray objects where one or both do not have coordinate labels succeeds only if all dimensions of the same name have the same length. Otherwise, it raises an informative error:
xr.align(da, da[:2])
AlignmentError: cannot reindex or align along dimension 'x' because of conflicting dimension sizes: {2, 3}
Underlying Indexes#
Xarray uses the pandas.Index internally to perform indexing
operations. If you need to access the underlying indexes, they are available
through the indexes attribute.
da = xr.DataArray( np.random.rand(4, 3), [ ("time", pd.date_range("2000年01月01日", periods=4)), ("space", ["IA", "IL", "IN"]), ], ) da
<xarray.DataArray (time: 4, space: 3)> Size: 96B array([[0.17091717, 0.39465901, 0.64166617], [0.27459243, 0.46235433, 0.87137165], [0.40113122, 0.61058827, 0.11796713], [0.70218436, 0.41403366, 0.34234521]]) Coordinates: * time (time) datetime64[us] 32B 2000年01月01日 2000年01月02日 ... 2000年01月04日 * space (space) <U2 24B 'IA' 'IL' 'IN'
- time: 4
- space: 3
- 0.1709 0.3947 0.6417 0.2746 0.4624 ... 0.118 0.7022 0.414 0.3423
array([[0.17091717, 0.39465901, 0.64166617], [0.27459243, 0.46235433, 0.87137165], [0.40113122, 0.61058827, 0.11796713], [0.70218436, 0.41403366, 0.34234521]])
- time(time)datetime64[us]2000年01月01日 ... 2000年01月04日
array(['2000年01月01日T00:00:00.000000', '2000年01月02日T00:00:00.000000', '2000年01月03日T00:00:00.000000', '2000年01月04日T00:00:00.000000'], dtype='datetime64[us]')
- space(space)<U2'IA' 'IL' 'IN'
array(['IA', 'IL', 'IN'], dtype='<U2')
da.indexes
Indexes: time DatetimeIndex(['2000年01月01日', '2000年01月02日', '2000年01月03日', '2000年01月04日'], dtype='datetime64[us]', name='time', freq='D') space Index(['IA', 'IL', 'IN'], dtype='str', name='space')
da.indexes["time"]
DatetimeIndex(['2000年01月01日', '2000年01月02日', '2000年01月03日', '2000年01月04日'], dtype='datetime64[us]', name='time', freq='D')
Use get_index() to get an index for a dimension,
falling back to a default pandas.RangeIndex if it has no coordinate
labels:
da = xr.DataArray([1, 2, 3], dims="x") da
<xarray.DataArray (x: 3)> Size: 24B array([1, 2, 3]) Dimensions without coordinates: x
- x: 3
- 1 2 3
array([1, 2, 3])
da.get_index("x")
RangeIndex(start=0, stop=3, step=1, name='x')
Copies vs. Views#
Whether array indexing returns a view or a copy of the underlying data depends on the nature of the labels.
For positional (integer) indexing, xarray follows the same rules as NumPy:
Positional indexing with only integers and slices returns a view.
Positional indexing with arrays or lists returns a copy.
The rules for label based indexing are more complex:
Label-based indexing with only slices returns a view.
Label-based indexing with arrays returns a copy.
Label-based indexing with scalars returns a view or a copy, depending upon if the corresponding positional indexer can be represented as an integer or a slice object. The exact rules are determined by pandas.
Whether data is a copy or a view is more predictable in xarray than in pandas, so unlike pandas, xarray does not produce SettingWithCopy warnings. However, you should still avoid assignment with chained indexing.
Note that other operations (such as values()) may also return views rather than copies.
Multi-level indexing#
Just like pandas, advanced indexing on multi-level indexes is possible with
loc and sel. You can slice a multi-index by providing multiple indexers,
i.e., a tuple of slices, labels, list of labels, or any selector allowed by
pandas:
midx = pd.MultiIndex.from_product([list("abc"), [0, 1]], names=("one", "two")) mda = xr.DataArray(np.random.rand(6, 3), [("x", midx), ("y", range(3))]) mda
<xarray.DataArray (x: 6, y: 3)> Size: 144B array([[0.59592532, 0.19986426, 0.09973676], [0.73459622, 0.01654451, 0.4813845 ], [0.09593887, 0.49730633, 0.83879627], [0.89733326, 0.73259152, 0.75872436], [0.56065718, 0.47147793, 0.13876812], [0.09446113, 0.94225634, 0.13409924]]) Coordinates: * x (x) object 48B MultiIndex * one (x) object 48B 'a' 'a' 'b' 'b' 'c' 'c' * two (x) int64 48B 0 1 0 1 0 1 * y (y) int64 24B 0 1 2
- x: 6
- y: 3
- 0.5959 0.1999 0.09974 0.7346 0.01654 ... 0.1388 0.09446 0.9423 0.1341
array([[0.59592532, 0.19986426, 0.09973676], [0.73459622, 0.01654451, 0.4813845 ], [0.09593887, 0.49730633, 0.83879627], [0.89733326, 0.73259152, 0.75872436], [0.56065718, 0.47147793, 0.13876812], [0.09446113, 0.94225634, 0.13409924]])
- x(x)objectMultiIndex
[6 values with dtype=object]
- one(x)object'a' 'a' 'b' 'b' 'c' 'c'
[6 values with dtype=object]
- two(x)int640 1 0 1 0 1
[6 values with dtype=int64]
- y(y)int640 1 2
array([0, 1, 2])
mda.sel(x=(list("ab"), [0]))
<xarray.DataArray (x: 2, y: 3)> Size: 48B array([[0.59592532, 0.19986426, 0.09973676], [0.09593887, 0.49730633, 0.83879627]]) Coordinates: * x (x) object 16B MultiIndex * one (x) object 16B 'a' 'b' * two (x) int64 16B 0 0 * y (y) int64 24B 0 1 2
- x: 2
- y: 3
- 0.5959 0.1999 0.09974 0.09594 0.4973 0.8388
array([[0.59592532, 0.19986426, 0.09973676], [0.09593887, 0.49730633, 0.83879627]])
- x(x)objectMultiIndex
[2 values with dtype=object]
- one(x)object'a' 'b'
[2 values with dtype=object]
- two(x)int640 0
[2 values with dtype=int64]
- y(y)int640 1 2
array([0, 1, 2])
You can also select multiple elements by providing a list of labels or tuples or a slice of tuples:
mda.sel(x=[("a", 0), ("b", 1)])
<xarray.DataArray (x: 2, y: 3)> Size: 48B array([[0.59592532, 0.19986426, 0.09973676], [0.89733326, 0.73259152, 0.75872436]]) Coordinates: * x (x) object 16B MultiIndex * one (x) object 16B 'a' 'b' * two (x) int64 16B 0 1 * y (y) int64 24B 0 1 2
- x: 2
- y: 3
- 0.5959 0.1999 0.09974 0.8973 0.7326 0.7587
array([[0.59592532, 0.19986426, 0.09973676], [0.89733326, 0.73259152, 0.75872436]])
- x(x)objectMultiIndex
[2 values with dtype=object]
- one(x)object'a' 'b'
[2 values with dtype=object]
- two(x)int640 1
[2 values with dtype=int64]
- y(y)int640 1 2
array([0, 1, 2])
Additionally, xarray supports dictionaries:
mda.sel(x={"one": "a", "two": 0})
<xarray.DataArray (y: 3)> Size: 24B
array([0.59592532, 0.19986426, 0.09973676])
Coordinates:
* y (y) int64 24B 0 1 2
x object 8B ('a', np.int64(0))
one <U1 4B 'a'
two int64 8B 0- y: 3
- 0.5959 0.1999 0.09974
array([0.59592532, 0.19986426, 0.09973676])
- y(y)int640 1 2
array([0, 1, 2])
- x()object('a', np.int64(0))
array(('a', np.int64(0)), dtype=object) - one()<U1'a'
array('a', dtype='<U1') - two()int640
array(0)
For convenience, sel also accepts multi-index levels directly
as keyword arguments:
mda.sel(one="a", two=0)
<xarray.DataArray (y: 3)> Size: 24B
array([0.59592532, 0.19986426, 0.09973676])
Coordinates:
* y (y) int64 24B 0 1 2
x object 8B ('a', np.int64(0))
one <U1 4B 'a'
two int64 8B 0- y: 3
- 0.5959 0.1999 0.09974
array([0.59592532, 0.19986426, 0.09973676])
- y(y)int640 1 2
array([0, 1, 2])
- x()object('a', np.int64(0))
array(('a', np.int64(0)), dtype=object) - one()<U1'a'
array('a', dtype='<U1') - two()int640
array(0)
Note that using sel it is not possible to mix a dimension
indexer with level indexers for that dimension
(e.g., mda.sel(x={'one': 'a'}, two=0) will raise a ValueError).
Like pandas, xarray handles partial selection on multi-index (level drop). As shown below, it also renames the dimension / coordinate when the multi-index is reduced to a single index.
mda.loc[{"one": "a"}, ...]
<xarray.DataArray (two: 2, y: 3)> Size: 48B array([[0.59592532, 0.19986426, 0.09973676], [0.73459622, 0.01654451, 0.4813845 ]]) Coordinates: * two (two) int64 16B 0 1 * y (y) int64 24B 0 1 2 one <U1 4B 'a'
- two: 2
- y: 3
- 0.5959 0.1999 0.09974 0.7346 0.01654 0.4814
array([[0.59592532, 0.19986426, 0.09973676], [0.73459622, 0.01654451, 0.4813845 ]])
- two(two)int640 1
array([0, 1])
- y(y)int640 1 2
array([0, 1, 2])
- one()<U1'a'
array('a', dtype='<U1')
Unlike pandas, xarray does not guess whether you provide index levels or
dimensions when using loc in some ambiguous cases. For example, for
mda.loc[{'one': 'a', 'two': 0}] and mda.loc['a', 0] xarray
always interprets (‘one’, ‘two’) and (‘a’, 0) as the names and
labels of the 1st and 2nd dimension, respectively. You must specify all
dimensions or use the ellipsis in the loc specifier, e.g. in the example
above, mda.loc[{'one': 'a', 'two': 0}, :] or mda.loc[('a', 0), ...].
Indexing rules#
Here we describe the full rules xarray uses for vectorized indexing. Note that this is for the purposes of explanation: for the sake of efficiency and to support various backends, the actual implementation is different.
(Only for label based indexing.) Look up positional indexes along each dimension from the corresponding
pandas.Index.A full slice object
:is inserted for each dimension without an indexer.sliceobjects are converted into arrays, given bynp.arange(*slice.indices(...)).Assume dimension names for array indexers without dimensions, such as
np.ndarrayandlist, from the dimensions to be indexed along. For example,v.isel(x=[0, 1])is understood asv.isel(x=xr.DataArray([0, 1], dims=['x'])).For each variable in a
DatasetorDataArray(the array and its coordinates):Broadcast all relevant indexers based on their dimension names (see Broadcasting by dimension name for full details).
Index the underling array by the broadcast indexers, using NumPy’s advanced indexing rules.
If any indexer DataArray has coordinates and no coordinate with the same name exists, attach them to the indexed object.
Note
Only 1-dimensional boolean arrays can be used as indexers.