问题
I guess I'm missing something (still a Dask Noob) but I'm trying the batching suggestion to avoid too many Dask tasks from here:
https://docs.dask.org/en/latest/delayed-best-practices.html
and can't make them work. This is what I tried:
import dask
def f(x):
return x*x
def batch(seq):
sub_results = []
for x in seq:
sub_results.append(f(x))
return sub_results
batches = []
for i in range(0, 1000000000, 1000000):
result_batch = dask.delayed(batch, range(i, i + 1000000))
batches.append(result_batch)
Batches now contains delayed objects:
batches[:3]
[Delayed(range(0, 1000000)),
Delayed(range(1000000, 2000000)),
Delayed(range(2000000, 3000000))]
but when I compute them I get batch function pointers (I think??):
results = dask.compute(*batches)
results[:3]
(<function __main__.batch(seq)>,
<function __main__.batch(seq)>,
<function __main__.batch(seq)>)
I have two questions:
Is this really how should this be run, because it seems to contradict the first line of the
Best practices
page where it says to not run it likedelayed(f(x))
because that would run immediately and not lazy.How do I get the results of above batched run?
回答1:
It looks like your code is missing a pair of parentheses. Not sure if this was a typo (???).
Per the example from the docs, I think you wanted
result_batch = dask.delayed(batch)(range(i, i + 1000000))
where I replaced batch, ran...
by batch)(ran...
, since the call to the batch()
function should be delayed.
Answers
- With the typo fixed, your code works fine for me - the computation will now be delayed. Regarding what's written at the start of the docs - it matters what is being wrapped with
dask.delayed
. Withdask.delayed( batch(range(i, i + 1000000)) )
the call to the functionbatch(...)
is not being delayed and so it would run immediately. This is because the output of the function has been wrapped withindask.delayed
, and so the output (result) would be delayed, which is not the desired workflow. However,dask.delayed(batch)(range(i, i + 1000000))
delays the call to the function (since, here,dask.delayed
wraps the function itself). I believe this is what the docs are trying to say, at the start of the best practices section. - Again, with the typo fixed, your code runs as expected for me and prints the lengthy output to the screen.
来源:https://stackoverflow.com/questions/56453201/how-do-the-batching-instructions-of-dask-delayed-best-practices-work