Computation#

The labels associated with DataArray and Dataset objects enables some powerful shortcuts for computation, notably including aggregation and broadcasting by dimension names.

Aggregation#

Aggregation methods have been updated to take a dim argument instead of axis. This allows for very intuitive syntax for aggregation methods that are applied along particular dimension(s):

arr.sum(dim="x")

<xarray.DataArray (y: 3)> Size: 24B
array([0.65348932, 1.08305695, 0.9537291 ])
Coordinates:
 * y (y) int64 24B 10 20 30

xarray.DataArray

y: 3

0.6535 1.083 0.9537

array([0.65348932, 1.08305695, 0.9537291 ])

Coordinates: (1)
- y
  (y)
  int64
  10 20 30
```
array([10, 20, 30])
```

arr.std(["x", "y"])

<xarray.DataArray ()> Size: 8B
array(0.35793963)

xarray.DataArray

0.3579
```
array(0.35793963)
```

arr.min()

<xarray.DataArray ()> Size: 8B
array(0.01652764)

xarray.DataArray

0.01653
```
array(0.01652764)
```

If you need to figure out the axis number for a dimension yourself (say, for wrapping code designed to work with numpy arrays), you can use the get_axis_num() method:

arr.get_axis_num("y")

These operations automatically skip missing values, like in pandas:

xr.DataArray([1, 2, np.nan, 3]).mean()

<xarray.DataArray ()> Size: 8B
array(2.)

xarray.DataArray

2.0
```
array(2.)
```

If desired, you can disable this behavior by invoking the aggregation method with skipna=False.

Computation using Coordinates#

Xarray objects have some handy methods for the computation with their coordinates. differentiate() computes derivatives by central finite differences using their coordinates,

a = xr.DataArray([0, 1, 2, 3], dims=["x"], coords=[[0.1, 0.11, 0.2, 0.3]])
a.differentiate("x")

<xarray.DataArray (x: 4)> Size: 32B
array([100. , 91.11111111, 10.58479532, 10. ])
Coordinates:
 * x (x) float64 32B 0.1 0.11 0.2 0.3

xarray.DataArray

x: 4

100.0 91.11 10.58 10.0

array([100. , 91.11111111, 10.58479532, 10. ])

Coordinates: (1)
- x
  (x)
  float64
  0.1 0.11 0.2 0.3
```
array([0.1 , 0.11, 0.2 , 0.3 ])
```

This method can be used also for multidimensional arrays,

a = xr.DataArray(
 np.arange(8).reshape(4, 2), dims=["x", "y"], coords={"x": [0.1, 0.11, 0.2, 0.3]}
)
a.differentiate("x")

<xarray.DataArray (x: 4, y: 2)> Size: 64B
array([[200. , 200. ],
 [182.22222222, 182.22222222],
 [ 21.16959064, 21.16959064],
 [ 20. , 20. ]])
Coordinates:
 * x (x) float64 32B 0.1 0.11 0.2 0.3
Dimensions without coordinates: y

xarray.DataArray

x: 4
y: 2

200.0 200.0 182.2 182.2 21.17 21.17 20.0 20.0

array([[200. , 200. ],
 [182.22222222, 182.22222222],
 [ 21.16959064, 21.16959064],
 [ 20. , 20. ]])

Coordinates: (1)
- x
  (x)
  float64
  0.1 0.11 0.2 0.3
```
array([0.1 , 0.11, 0.2 , 0.3 ])
```

integrate() computes integration based on trapezoidal rule using their coordinates,

a.integrate("x")

<xarray.DataArray (y: 2)> Size: 16B
array([0.78, 0.98])
Dimensions without coordinates: y

xarray.DataArray

y: 2

0.78 0.98
```
array([0.78, 0.98])
```

Note

These methods are limited to simple cartesian geometry. Differentiation and integration along multidimensional coordinate are not supported.

Automatic alignment#

Xarray enforces alignment between index Coordinates (that is, coordinates with the same name as a dimension, marked by *) on objects used in binary operations.

Similarly to pandas, this alignment is automatic for arithmetic on binary operations. The default result of a binary operation is by the intersection (not the union) of coordinate labels:

arr = xr.DataArray(np.arange(3), [("x", range(3))])
arr + arr[:-1]

<xarray.DataArray (x: 2)> Size: 16B
array([0, 2])
Coordinates:
 * x (x) int64 16B 0 1

xarray.DataArray

x: 2

0 2
```
array([0, 2])
```
Coordinates: (1)
- x
  (x)
  int64
  0 1
```
array([0, 1])
```

If coordinate values for a dimension are missing on either argument, all matching dimensions must have the same size:

arr + xr.DataArray([1, 2], dims="x")

AlignmentError: cannot reindex or align along dimension 'x' because of conflicting dimension sizes: {2, 3} (note: an index is found along that dimension with size=3)

However, one can explicitly change this default automatic alignment type ("inner") via set_options() in context manager:

with xr.set_options(arithmetic_join="outer"):
 arr + arr[:1]
arr + arr[:1]

<xarray.DataArray (x: 1)> Size: 8B
array([0])
Coordinates:
 * x (x) int64 8B 0

xarray.DataArray

x: 1

0
```
array([0])
```
Coordinates: (1)
- x
  (x)
  int64
  0
```
array([0])
```

Before loops or performance critical code, it’s a good idea to align arrays explicitly (e.g., by putting them in the same Dataset or using align()) to avoid the overhead of repeated alignment with each operation. See Align and reindex for more details.

Note

There is no automatic alignment between arguments when performing in-place arithmetic operations such as +=. You will need to use manual alignment. This ensures in-place arithmetic never needs to modify data types.

Wrapping custom computation#

It doesn’t always make sense to do computation directly with xarray objects:

In the inner loop of performance limited code, using xarray can add considerable overhead compared to using NumPy or native Python types. This is particularly true when working with scalars or small arrays (less than ~1e6 elements). Keeping track of labels and ensuring their consistency adds overhead, and xarray’s core itself is not especially fast, because it’s written in Python rather than a compiled language like C. Also, xarray’s high level label-based APIs removes low-level control over how operations are implemented.

Even if speed doesn’t matter, it can be important to wrap existing code, or to support alternative interfaces that don’t use xarray objects.

For these reasons, it is often well-advised to write low-level routines that work with NumPy arrays, and to wrap these routines to work with xarray objects. However, adding support for labels on both Dataset and DataArray can be a bit of a chore.

To make this easier, xarray supplies the apply_ufunc() helper function, designed for wrapping functions that support broadcasting and vectorization on unlabeled arrays in the style of a NumPy universal function ("ufunc" for short). apply_ufunc takes care of everything needed for an idiomatic xarray wrapper, including alignment, broadcasting, looping over Dataset variables (if needed), and merging of coordinates. In fact, many internal xarray functions/methods are written using apply_ufunc.

Simple functions that act independently on each value should work without any additional arguments:

squared_error = lambda x, y: (x - y) ** 2
arr1 = xr.DataArray([0, 1, 2, 3], dims="x")
xr.apply_ufunc(squared_error, arr1, 1)

<xarray.DataArray (x: 4)> Size: 32B
array([1, 0, 1, 4])
Dimensions without coordinates: x

xarray.DataArray

x: 4

1 0 1 4
```
array([1, 0, 1, 4])
```

For using more complex operations that consider some array values collectively, it’s important to understand the idea of "core dimensions" from NumPy’s generalized ufuncs. Core dimensions are defined as dimensions that should not be broadcast over. Usually, they correspond to the fundamental dimensions over which an operation is defined, e.g., the summed axis in np.sum. A good clue that core dimensions are needed is the presence of an axis argument on the corresponding NumPy function.

With apply_ufunc, core dimensions are recognized by name, and then moved to the last dimension of any input arguments before applying the given function. This means that for functions that accept an axis argument, you usually need to set axis=-1. As an example, here is how we would wrap numpy.linalg.norm() to calculate the vector norm:

defvector_norm(x, dim, ord=None):
 return xr.apply_ufunc(
 np.linalg.norm, x, input_core_dims=[[dim]], kwargs={"ord": ord, "axis": -1}
 )

vector_norm(arr1, dim="x")

<xarray.DataArray ()> Size: 8B
array(3.74165739)

xarray.DataArray

3.742
```
array(3.74165739)
```

Because apply_ufunc follows a standard convention for ufuncs, it plays nicely with tools for building vectorized functions, like numpy.broadcast_arrays() and numpy.vectorize. For high performance needs, consider using Numba’s vectorize and guvectorize.

In addition to wrapping functions, apply_ufunc can automatically parallelize many functions when using dask by setting dask='parallelized'. See Parallelize custom functions with apply_ufunc and map_blocks for details.

apply_ufunc() also supports some advanced options for controlling alignment of variables and the form of the result. See the docstring for full details and more examples.

Computation#

Basic array math#

Missing values#

Aggregation#

Rolling window operations#

Weighted array reductions#

Coarsen large arrays#

Computation using Coordinates#

Fitting polynomials#

Fitting arbitrary functions#

Broadcasting by dimension name#

Automatic alignment#

Coordinates#

Math with datasets#

Wrapping custom computation#