Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
1 vote
1 answer
64 views

I am trying to learn dask, and have created the following toy example of a delayed pipeline. +-----+ +-----+ +-----+ | baz +--+ bar +--+ foo | +-----+ +-----+ +-----+ So baz has a dependency on ...
0 votes
0 answers
74 views

I have an SQL Table in Snowflake,100K rows and 15 Columns. I want to import this table into my Jupyter notebook using Dask for further analysis. Primarily doing this a form of practice since I am new ...
2 votes
0 answers
40 views

I need to run a random forest classifier that I've put into a function ~ 10,000 times - because I sample randomly each time. I am trying to use dask delayed on a slurm-scheduled HPC cluster. My script ...
0 votes
1 answer
116 views

I'm trying to process a large dataset (around 1 million tasks) using Dask distributed computing in Python. (I am getting data from a database to process it, and I am retriving around 1M rows). Here I ...
0 votes
1 answer
50 views

I have a class that has something like the following context manager to create a dask client & cluster: class some_class(): def __init__(self,engine_kwargs: dict = None): self....
0 votes
0 answers
73 views

Context: I have 4 xarray datasets that are 8Gb, 45Gb, 8Gb and 20Gb (80Gb total). They all have 1 3D variable with axis: time, y, x. I want to combine them and save the output on disk. Operation on ...
0 votes
1 answer
72 views

Every time I try to compute the dataframe it fails giving me the following or similar error messages: Exception: 'ValueError("could not convert string to float: \'<NA>\'")' Right now, ...
0 votes
1 answer
86 views

My code is meant to match names of two large datasets. The function I use creates a delayed list of matched names. After applying from_delayed the number of partitions increases and is equal to the ...
1 vote
1 answer
66 views

This is a follow-up question to an earlier question: Implementing 1D interpolation on a 3D Array in Numpy or Xarray Tsoil is a 3D xarray dataset with the following dimensions: <xarray.DataArray '...
0 votes
0 answers
74 views

I have a following problem - I have a list of delayed objects after applying following code (see below): When I am applying ddf = dd.from_delayed(lazy_results_names) instead of dask dataframe I ...
0 votes
0 answers
118 views

I have ~30GB uncompressed spatial data, it contains id, tags, and coordinates as three columns in parquet file with row group size 64MB. I used dask read_parquet with block_size 32MiB got 118 ...
0 votes
2 answers
281 views

I have multiple .csv.gz files which I'm trying to read into a dask dataframe, I was able to achive this using this code : file_paths = glob.glob(file_pattern) @delayed def read_csv(file_paths): ...
0 votes
0 answers
146 views

I have the below traditional Python function, without any array-type flavour, but which I need to run many times. Hence, I used Dask-parallelization using dask.delayed. However, I can see a gradual ...
0 votes
1 answer
266 views

I'm trying to compute pairwise rations on a large scale data where each column is a separated sample like this (this is a small example): 0 1 2 0 34.04 56.55 49....
1 vote
0 answers
385 views

Consider an SSHCluster with multiple hosts. cluster = SSHCluster(["localhost", "hostname"], connect_options={"known_hosts": None}, ...

15 30 50 per page
1
2 3 4 5
...
21

AltStyle によって変換されたページ (->オリジナル) /