[フレーム]

Examples#

Dask Array is used across a wide variety of applications — anywhere where working with large array dataset.

This Analyzing the National Water Model with Xarray, Dask, and Coiled example process 6 TB of geospatial data on a cluster using Xarray and Dask Array. The cluster in this example is deployed with Coiled, but there are many options for managing and deploying Dask. See our Deploy Dask Clusters documentation for more information on deployment options.

You can also visit https://examples.dask.org/array.html for a collection of additional examples.

Design#

Dask arrays coordinate many numpy arrays

Dask arrays coordinate many NumPy arrays (or "duck arrays" that are sufficiently NumPy-like in API such as CuPy or Sparse arrays) arranged into a grid. These arrays may live on disk or on other machines.

New duck array chunk types (types below Dask on NEP-13’s type-casting hierarchy) can be registered via register_chunk_type(). Any other duck array types that are not registered will be deferred to in binary operations and NumPy ufuncs/functions (that is, Dask will return NotImplemented). Note, however, that any ndarray-like type can be inserted into a Dask Array using from_array().

Common Uses#

Dask Array is used in fields like atmospheric and oceanographic science, large scale imaging, genomics, numerical algorithms for optimization or statistics, and more.

Scope#

Dask arrays support most of the NumPy interface like the following:

However, Dask Array does not implement the entire NumPy interface. Users expecting this will be disappointed. Notably, Dask Array lacks the following features:

See the dask.array API for a more extensive list of functionality.

Execution#

By default, Dask Array uses the threaded scheduler in order to avoid data transfer costs, and because NumPy releases the GIL well. It is also quite effective on a cluster using the dask.distributed scheduler.

Contents

AltStyle によって変換されたページ (->オリジナル) /