问题
If I have a concurrency::array_view
being operated on in a concurrency::parallel_for_each
loop, my understanding is that I can continue other tasks on the CPU while the loop is executing:
using namespace Concurrency;
array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
// do some intense computations on av
}
// do some stuff on the CPU while we wait
av.synchronize(); // wait for the parallel_for_each loop to finish and copy the data
But what if I want to not wait for the parallel for loop but start copying data back from the GPU as soon as possible. Will the following work?
using namespace Concurrency;
array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
// do some intense computations on av
}
// I know that we won't be waiting to synch when I call this, but will we be waiting here
// until the data is available on the GPU end to START copying?
completion_future waitOnThis = av.synchronize_asynch();
// will this line execute before parallel_for_each has finished processing, or only once it
// has finished processing an the data from "av" has started copying back?
completion_future.wait();
I read about this topic on The Moth, but after reading the following I'm not really any wiser:
Please note that the parallel_for_each executes as if synchronous to the calling code, but in reality, it is asynchronous. I.e. once the parallel_for_each call is made and the kernel has been passed to the runtime, the some_code_B region continues to execute immediately by the CPU thread, while in parallel the kernel is executed by the GPU threads. However, if you try to access the (array or array_view) data that you captured in the lambda in the some_code_B region, your code will block until the results become available. Hence the correct statement: the parallel_for_each is as-if synchronous in terms of visible side-effects, but asynchronous in reality.
回答1:
I don't like the way this has been explained. A better way to think about it is that the parallel_for_each
queues work to the GPU, so it returns almost immediately. There are numerous ways that your CPU-side code can block until the queued work is complete, for example, explicitly calling synchronize
, or accessing data from one of the array_view
instances used within the parallel_for_each
using namespace concurrency;
array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
// Queue (or schedule if you like) some intense computations on av
}
Host code can execute now. The AMP computations may or may not have started. If the code here accesses av
then it will block until the work on the GPU is complete and the data in av
has been written and can be synchronized with the host memory.
This is a future so it is also a scheduled task. It is not guaranteed to
execute at any particular point. Should it be scheduled then it will block the thread it is running on until av
is correctly synchronized with the host memory (as above).
completion_future waitOnThis = av.synchronize_asynch();
More host code can execute here. If the host code accesses av
then it will block until the parallel_for_each
has completed (as above). At some point the runtime will execute the future and block until av
has synchronized with the host memory. If it is writable and has been changed then it will be copied back to the host memory.
completion_future.wait();
The call to wait
will block until the future has completed (prior to calling wait
there is no guarantee that anything has actually executed). At this point you are guaranteed that the GPU calculations are complete and that av
can be accessed on the CPU.
Having said all that adding the waitOnThis
future seems to be over complicating matters.
array_view<int> av;
parallel_for_each(extent<1>(number),[=](index<1> idx)
{
// do some intense computations on av on the GPU
}
// do some independent CPU computation here.
av.synchronize();
// do some computation on the CPU that relies on av here.
The MSDN docs aren't very good on this topic. The following blog post is better. There are some other posts on the async APIs on the same blog.
来源:https://stackoverflow.com/questions/19830470/will-array-view-synchronize-asynch-wait-for-parallel-for-each-completion