(Pleas feel free to re-categorize)

**Context**

The context of my work is that I am bench-marking Dask workloads (Petabytes of data) on Dask distributed backend vs Ray backend.

I’ve done the benchmark for Dask distributed backend. My workload would cost ~$1 million dollars (yes you’ve read that right) with Dask distributed backend - it’d be running 35 r5.24xlarge instances continuously for ~180 days.

I am hoping Ray backend can be better

So I’ve set up a Ray cluster with AWS EC2 (autoscaler).

The workload is about a million Dask graphs. Each graph’s last node is a Dask delayed object, so I do the compute using `delayed_obj.compute(scheduler=ray_dask_get)`

.

**Problem**

It doesn’t seem that any of my worker nodes are getting utilized. I am submitting 1000 graphs to the cluster at a time, so there should be a plenty of work to do.

I am wondering if Dask on Ray + Ray Distributed Cluster is supposed to work? I am looking at this line ray/scheduler.py at master · ray-project/ray · GitHub . I have head/worker nodes with VCPU 96. Total CPU in the cluster should be 96 * 35. But when I do `print(CPU_COUNT`

), where CPU_COUNT is from `dask.system`

, I’ll get 96 (which makes sense). So does this mean thread pool of size 96 is getting used instead of pool of 96 * 35?