How can I run a complete Dask.distributed cluster in a single thread? I want to use this for debugging or profiling.
Note: this is a frequently asked question. I\'
If you can get by with the single-machine scheduler's API (just compute) then you can use the single-threaded scheduler
x.compute(scheduler='single-threaded')
If you want to run a dask.distributed cluster on a single machine you can start the client with no arguments
from dask.distributed import Client
client = Client() # Starts local cluster
x.compute()
This uses many threads but operates on one machine
Alternatively if you want to run everything in a single process then you can use the processes=False
keyword
from dask.distributed import Client
client = Client(processes=False) # Starts local cluster
x.compute()
All of the communication and control happen in a single thread, though computation occurs in a separate thread pool.
To run control, communication, and computation all in a single thread you need to create a Tornado concurrent.futures Executor. Beware, this Tornado API may not be public.
from dask.distributed import Scheduler, Worker, Client
from tornado.concurrent import DummyExecutor
from tornado.ioloop import IOLoop
import threading
loop = IOLoop()
e = DummyExecutor()
s = Scheduler(loop=loop)
s.start()
w = Worker(s.address, loop=loop, executor=e)
loop.add_callback(w._start)
async def f():
async with Client(s.address, start=False) as c:
future = c.submit(threading.get_ident)
result = await future
return result
>>> threading.get_ident() == loop.run_sync(f)
True