What is a good sorting algorithm on CUDA?

后端 未结 4 857
南笙
南笙 2021-02-04 13:09

I have an array of struct and I need to sort this array according to a property of the struct (N). The object looks like this:

 struct OBJ
 { 
   int N; //sort a         


        
相关标签:
4条回答
  • 2021-02-04 13:19

    Yes I would totally agree, the overhead of sorting small arrays (<5k elements) kills the possible speedup you will achieve with a "fine-tuned" parallel sorting algorithm implemented in CUDA. I would prefer CPU based sorting for such a small size...

    0 讨论(0)
  • 2021-02-04 13:27

    Why exactly are you heading towards CUDA? I mean, it smells like your problem is not one of those, CUDA is very good at. You just want to sort an array of 512 Elements and let some pointers refer to another location. This is nothing fancy, use a simple serial algorithm for that, e.g. Quicksort, Heapsort or Mergesort.

    Additionally, think about the overhead it takes to copy data from your Heap/Stack to your CUDA device. Using CUDA just makes sense, when the calculations are intense enough so that COMPUTING_TIME_ON_CUDA+COPY_DATA_FROM_HEAP_TO_CUDA_DEVICE+COPY_DATA_FROM_CUDA_DEVICE_TO_HEAP < COMPUTING_TIME_ON_HOST_CPU.

    Besides, CUDA is immersely powerful at math calculations with big vectors and matrices and rather simple data-types (numbers) because it is one of the problems that often arise on a GPU: Calculating graphics.

    0 讨论(0)
  • 2021-02-04 13:31

    What means "big" and "small" ?

    By "big" I assume you mean something of >1M elements, while small --- small enough to actually fit in shared memory (probably <1K elements). If my understanding of "small" matches yours, I would try the following:

    • Use only a single block to sort the array (it can be then a part of some bigger CUDA kernel)
    • Bitonic sort is one of good appraches which can be adopted for parallel algorithm.

    Some pages on bitonic sort:

    • Bitonic sort (nice explanation, applet to visualise and java source which does not take too much space)
    • Wikipedia (a bit too short explanation for my taste, but more source codes - some abstract language and Java)
    • NVIDIA code Samples (A sample source in CUDA. I think it is a bit ovefocused on killing bank conflicts. I believe the simpler code may actually perform faster)

    I once also implemented a bubble sort (lol!) for a single warp to sort arrays of 32 elements. Thanks to its simplicity it did not perform that bad actually. A well tuned bitonic sort will still perform faster though.

    0 讨论(0)
  • 2021-02-04 13:32

    Use the sorting calls available in the CUDPP or the Thrust library.

    If you use cudppSort, note that it only works with integers or floats. To sort your array of structures, you can first sort the keys along with an index array. Later, you can use the sorted index array to move the structures to their final sorted location. I have described how to do this for the cudppCompact compaction algorithm in a blog post here. The steps are similar for sorting an array of structs using cudppSort.

    0 讨论(0)
提交回复
热议问题