Autoscaler scales down to 1 worker. #930

me-her · 2025-01-21T16:02:39Z

Autoscaler scales down to 1 worker despite configuring the minimum to be 2 workers:

Testing Scenario:

#Ran a big computation 

import distributed
client = await distributed.Client(" <hosted-url>:8786", asynchronous=True,direct_to_workers=True)

import dask.array as da 
array = da.random.random(size=(40960, 4096, 4096), chunks="256M").astype("float32")
mean = await client.compute(array.mean())

1st Run with min as 8 and max as 16.

2nd Run with min as 2 and max as 12

The scale up is perfect. It scales up as expected to 16 workers - 1st Run and 12 workers - 2nd Run respectively.
While scaling down, the operator logs as shown below indicate that it scales down to 2 (The minimum configured) but then I see only one worker remaining. This happened on both runs.

Operator Logs

Scheduler Logs

Anything else we need to know?:

I see the workers scale down to 1 even when the minimum configured was 8. Earlier version of dask-operator had a problem of staying at the maximum number of workers, as in when they scale up to max, they never scale back down. This version the scale up works perfectly as expected but the scale down happens to 1.

Environment:

Dask version: 2024.12.1
Distributed : 2024.12.1
Dask-Operator: 2025.1.0
Python version: 3.10
Operating System: Linux
Install method (conda, pip, source): pip
Running this on GKE

The text was updated successfully, but these errors were encountered:

jacobtomlinson · 2025-01-21T16:21:29Z

Would you be able to create a complete example which includes creating the cluster and running some workloads the reproduces the problem?

Ideally it should be the smallest amount of code that you can write that reproduces the issue, that I can copy/paste to see the problem for myself.

me-her · 2025-01-22T16:05:19Z

We create the cluster using dask-kubernetes operator. It's a HELM chart deployment in GKE. I also configure the min and max workers in autoscaler.yml. Once my scheduler and workers are up.

I connect to the cluster and run the commutation like below.

import distributed
client = await distributed.Client(" <hosted-url>:8786", asynchronous=True,direct_to_workers=True)

import dask.array as da 
array = da.random.random(size=(40960, 4096, 4096), chunks="256M").astype("float32")
mean = await client.compute(array.mean())

Since this is a large enough commutation. The scaling triggers.

jacobtomlinson added the needs reproducible example label Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaler scales down to 1 worker. #930

Autoscaler scales down to 1 worker. #930

me-her commented Jan 21, 2025

jacobtomlinson commented Jan 21, 2025

me-her commented Jan 22, 2025 •

edited

Loading

Autoscaler scales down to 1 worker. #930

Autoscaler scales down to 1 worker. #930

Comments

me-her commented Jan 21, 2025

jacobtomlinson commented Jan 21, 2025

me-her commented Jan 22, 2025 • edited Loading

me-her commented Jan 22, 2025 •

edited

Loading