Copy data from GPU to CPU

牧云@^-^@ 提交于 2019-12-05 18:53:33

Your code (as is) doesn't compile, below is a fixed version which I think has the same intent If you want to break out the time for copying from the compute time then the simplest thing to do is to use array<> and explicit copies.

        int _height, _width;
        _height = _width = 3000;
        std::vector<int> _main(_height * _width); // host data.
        concurrency::extent<2> ext(_height, _width);
        // Start timing data copy
        concurrency::array<int, 2> GPU_main(ext /* default accelerator */);
        concurrency::array<int, 2> GPU_res(ext);
        concurrency::array<int, 2> GPU_temp(ext);
        concurrency::copy(begin(_main), end(_main), GPU_main);
        // Finish timing data copy
        int number = 20000;
        // Start timing compute
        for(int i=0; i < number; ++i)
        {
            concurrency::parallel_for_each(ext,
                [=, &GPU_res, &GPU_main](index<2> idx)restrict(amp)
            {
               GPU_res(idx) = GPU_main(idx) + idx[0];
            });
            concurrency::copy(GPU_res, GPU_temp);       // Swap arrays on GPU
            concurrency::copy(GPU_main, GPU_res);
            concurrency::copy(GPU_temp, GPU_main);
        }
        GPU_main.accelerator_view.wait(); // Wait for compute
        // Finish timing compute
        // Start timing data copy
        concurrency::copy(GPU_main, begin(_main));
        // Finish timing data copy

Note the wait() call to force the compute to finish. Remember that C++AMP commands usually queue work on the GPU and it is only guarenteed to have executed if you explicitly wait, with wait(), or for it or implicitly wait by calling (for example) synchronize() on an array_view<>. To get a good idea of timing you should really time the compute and data copies separately (as shown above). You can find some basic timing code here: http://ampbook.codeplex.com/SourceControl/changeset/view/100791#1983676 in Timer.h There are examples of it's use in the same folder.

However. I'm not sure I would really write the code this way unless I wanted to break out the copy and compute times. It is far simpler to use array<> for data that lives purely on the GPU and array_view<> for data that is copied to and from the GPU.

This would look like the code below.

        int _height, _width;
        _height = _width = 3000;
        std::vector<int> _main(_height * _width); // host data.
        concurrency::extent<2> ext(_height, _width);
        concurrency::array_view<int, 2> _main_av(_main.size(), _main); 
        concurrency::array<int, 2> GPU_res(ext);
        concurrency::array<int, 2> GPU_temp(ext);
        concurrency::copy(begin(_main), end(_main), _main_av);
        int number = 20000;
        // Start timing compute and possibly copy
        for(int i=0; i < number; ++i)
        {
            concurrency::parallel_for_each(ext,
                [=, &GPU_res, &_main_av](index<2> idx)restrict(amp)
            {
               GPU_res(idx) = _main_av(idx) + idx[0];
            });
            concurrency::copy(GPU_res, GPU_temp);  // Swap arrays on GPU
            concurrency::copy(_main_av, GPU_res);
            concurrency::copy(GPU_temp, _main_av);
        }
        _main_av.synchronize();  // Will wait for all work to finish
        // Finish timing compute & copy

Now the data that is only required on the GPU is declared to be on the GPU and the data that needs to be synchronized is declared as such. Clearer and less code.

You can find out more about this by reading my book on C++ AMP :)

user1274899

How did you measure the timing? You need to wait on the accelerator_view after parallel_for_each before doing the copy for accurate timing of computation and copy. You may want to check out the following blog posts for some tips of measuring performance of C++ AMP programs:

  1. How to measure the performance of C++ AMP algorithms (2011)
  2. How to measure the performance of C++ AMP algorithms (2012)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!