Mix custom memory management and Thrust in CUDA

后端未结

关注

 1  937

In my project, I have implemented a custom memory allocator to avoid unneccessary calls to cudaMalloc once the application has \"warmed up\". Moreover, I use cu

相关标签:

1条回答

别那么骄傲

2021-01-01 04:19

Like all standard c++ containers, you can customize how thrust::device_vector allocates storage by providing it with your own "allocator". By default, thrust::device_vector's allocator is thrust::device_malloc_allocator, which allocates (deallocates) storage with cudaMalloc (cudaFree) when Thrust's backend system is CUDA.

Occasionally, it is desirable to customize the way device_vector allocates memory, such as in the OP's case, who would like to sub-allocate storage within a single large allocation performed at program initialization. This can avoid overhead which may be incurred by many individual calls to the underlying allocation scheme, in this case, cudaMalloc.

A simple way to provide device_vector a custom allocator is to inherit from device_malloc_allocator. One could in principle author an entire allocator from scratch, but with an inheritance approach, only the allocate and deallocate member functions need to be provided. Once the custom allocator is defined, it can be provided to device_vector as its second template parameter.

This example code demonstrates how to provide a custom allocator which prints a message upon allocation and deallocation:

#include <thrust/device_malloc_allocator.h>
#include <thrust/device_vector.h>
#include <iostream>

template<typename T>
  struct my_allocator : thrust::device_malloc_allocator<T>
{
  // shorthand for the name of the base class
  typedef thrust::device_malloc_allocator<T> super_t;

  // get access to some of the base class's typedefs

  // note that because we inherited from device_malloc_allocator,
  // pointer is actually thrust::device_ptr<T>
  typedef typename super_t::pointer   pointer;

  typedef typename super_t::size_type size_type;

  // customize allocate
  pointer allocate(size_type n)
  {
    std::cout << "my_allocator::allocate(): Hello, world!" << std::endl;

    // defer to the base class to allocate storage for n elements of type T
    // in practice, you'd do something more interesting here
    return super_t::allocate(n);
  }

  // customize deallocate
  void deallocate(pointer p, size_type n)
  {
    std::cout << "my_allocator::deallocate(): Hello, world!" << std::endl;

    // defer to the base class to deallocate n elements of type T at address p
    // in practice, you'd do something more interesting here
    super_t::deallocate(p,n);
  }
};

int main()
{
  // create a device_vector which uses my_allocator
  thrust::device_vector<int, my_allocator<int> > vec;

  // create 10 ints
  vec.resize(10, 13);

  return 0;
}

Here's the output:

$ nvcc my_allocator_test.cu -arch=sm_20 -run
my_allocator::allocate(): Hello, world!
my_allocator::deallocate(): Hello, world!

In this example, note that we hear from my_allocator::allocate() once upon vec.resize(10,13). my_allocator::deallocate() is invoked once when vec goes out of scope as it destroys its elements.

0 讨论(0)