Dask-distributed. How to get task key ID in the function being calculated?

问题

My computations with dask.distributed include creation of intermediate files whose names include UUID4, that identify that chunk of work.

    pairs = '{}\n{}\n{}\n{}'.format(list1, list2, list3, ...)

    file_path = os.path.join(job_output_root, 'pairs',
                             'pairs-{}.txt'.format(str(uuid.uuid4()).replace('-', '')))

    file(file_path, 'wt').writelines(pairs)

In the same time, all tasks in the dask distributed cluster have unique keys. Therefore, it would be natural to use that key ID for file name.

Is it possible?

回答1:

There are two ways to approach the problem:

You determine the uuid and pass it to Dask (implemented)
Dask determines the uuid and passes it to your function (not implemented, but possible)

You pass the uuid to Dask

Functions like .submit accept a key= keyword argument where you can specify the key that you want used

>>> e.submit(inc, 1, key='inc-12345')
<Future: status: pending, key: inc-12345>

Similarly dask.delayed functions support a dask_key_name keyword argument

>>> value = delayed(inc)(1, dask_key_name='inc-12345')

You get the key from Dask

The scheduler places contextual information like this into a per-thread global during the execution of each task. As of Version 1.13 this is available as follows:

def your_function(...):
    from distributed.worker import thread_state
    key = thread_state.key

future = e.submit(your_function, ...)

来源：https://stackoverflow.com/questions/39330017/dask-distributed-how-to-get-task-key-id-in-the-function-being-calculated

标签

python-2.7

distributed

distributed-computing

dask

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!