will array_view.synchronize_asynch wait for parallel_for_each completion?

雨燕双飞 提交于 2019-12-20 04:41:19

问题


If I have a concurrency::array_view being operated on in a concurrency::parallel_for_each loop, my understanding is that I can continue other tasks on the CPU while the loop is executing:

using namespace Concurrency;

array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
  // do some intense computations on av
}

// do some stuff on the CPU while we wait

av.synchronize(); // wait for the parallel_for_each loop to finish and copy the data

But what if I want to not wait for the parallel for loop but start copying data back from the GPU as soon as possible. Will the following work?

using namespace Concurrency;

array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
  // do some intense computations on av
}

// I know that we won't be waiting to synch when I call this, but will we be waiting here
// until the data is available on the GPU end to START copying?
completion_future waitOnThis = av.synchronize_asynch();

// will this line execute before parallel_for_each has finished processing, or only once it
// has finished processing an the data from "av" has started copying back?

completion_future.wait();

I read about this topic on The Moth, but after reading the following I'm not really any wiser:

Please note that the parallel_for_each executes as if synchronous to the calling code, but in reality, it is asynchronous. I.e. once the parallel_for_each call is made and the kernel has been passed to the runtime, the some_code_B region continues to execute immediately by the CPU thread, while in parallel the kernel is executed by the GPU threads. However, if you try to access the (array or array_view) data that you captured in the lambda in the some_code_B region, your code will block until the results become available. Hence the correct statement: the parallel_for_each is as-if synchronous in terms of visible side-effects, but asynchronous in reality.


回答1:


I don't like the way this has been explained. A better way to think about it is that the parallel_for_each queues work to the GPU, so it returns almost immediately. There are numerous ways that your CPU-side code can block until the queued work is complete, for example, explicitly calling synchronize, or accessing data from one of the array_view instances used within the parallel_for_each

using namespace concurrency;

array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
  // Queue (or schedule if you like) some intense computations on av
}

Host code can execute now. The AMP computations may or may not have started. If the code here accesses av then it will block until the work on the GPU is complete and the data in av has been written and can be synchronized with the host memory.

This is a future so it is also a scheduled task. It is not guaranteed to execute at any particular point. Should it be scheduled then it will block the thread it is running on until av is correctly synchronized with the host memory (as above).

completion_future waitOnThis = av.synchronize_asynch();

More host code can execute here. If the host code accesses av then it will block until the parallel_for_each has completed (as above). At some point the runtime will execute the future and block until av has synchronized with the host memory. If it is writable and has been changed then it will be copied back to the host memory.

completion_future.wait();

The call to wait will block until the future has completed (prior to calling wait there is no guarantee that anything has actually executed). At this point you are guaranteed that the GPU calculations are complete and that av can be accessed on the CPU.

Having said all that adding the waitOnThis future seems to be over complicating matters.

array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
  // do some intense computations on av on the GPU
}

// do some independent CPU computation here.

av.synchronize();

// do some computation on the CPU that relies on av here.

The MSDN docs aren't very good on this topic. The following blog post is better. There are some other posts on the async APIs on the same blog.



来源:https://stackoverflow.com/questions/19830470/will-array-view-synchronize-asynch-wait-for-parallel-for-each-completion

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!