Dask scheduler empty / graph not showing

问题

I have a setup as follows:

# etl.py
from dask.distributed import Client
import dask
from tasks import task1, task2, task3

def runall(**kwargs):
    print("done")


def etl():
    client = Client()

    tasks = {}
    tasks['task1'] = dask.delayed(task)(*args)
    tasks['task2'] = dask.delayed(task)(*args)
    tasks['task3'] = dask.delayed(task)(*args)

     out = dask.delayed(runall)(**tasks)
     out.compute()

This logic was borrowed from luigi and works nicely with if statements to control what tasks to run.

However, some of the tasks load large amounts of data from SQL and cause GIL freeze warnings (At least this is my suspicion as it is hard to diagnose what line exactly causes the issue). Sometimes the graph / monitoring shown on 8787 does not show anything just scheduler empty, I suspect these are caused by the app freezing dask. What is the best way to load large amounts of data from SQL in dask. (MSSQL and oracle). At the moment this is doen with sqlalchemy with tuned settings. Would adding async and await help?

However, some of tasks are a bit slow and I'd like to use stuff like dask.dataframe or bag internally. The docs advise against calling delayed inside delayed. Does this also hold for dataframe and bag. The entire script is run on a single 40 core machine.

Using bag.starmap I get a graph like this:

where the upper straight lines are added/ discovered once the computation reaches that task and compute is called inside it.