I have an array of struct and I need to sort this array according to a property of the struct (N). The object looks like this:
struct OBJ
{
int N; //sort a
Yes I would totally agree, the overhead of sorting small arrays (<5k elements) kills the possible speedup you will achieve with a "fine-tuned" parallel sorting algorithm implemented in CUDA. I would prefer CPU based sorting for such a small size...
Why exactly are you heading towards CUDA? I mean, it smells like your problem is not one of those, CUDA is very good at. You just want to sort an array of 512 Elements and let some pointers refer to another location. This is nothing fancy, use a simple serial algorithm for that, e.g. Quicksort, Heapsort or Mergesort.
Additionally, think about the overhead it takes to copy data from your Heap/Stack to your CUDA device. Using CUDA just makes sense, when the calculations are intense enough so that COMPUTING_TIME_ON_CUDA+COPY_DATA_FROM_HEAP_TO_CUDA_DEVICE+COPY_DATA_FROM_CUDA_DEVICE_TO_HEAP < COMPUTING_TIME_ON_HOST_CPU
.
Besides, CUDA is immersely powerful at math calculations with big vectors and matrices and rather simple data-types (numbers) because it is one of the problems that often arise on a GPU: Calculating graphics.
What means "big" and "small" ?
By "big" I assume you mean something of >1M elements, while small --- small enough to actually fit in shared memory (probably <1K elements). If my understanding of "small" matches yours, I would try the following:
Some pages on bitonic sort:
I once also implemented a bubble sort (lol!) for a single warp to sort arrays of 32 elements. Thanks to its simplicity it did not perform that bad actually. A well tuned bitonic sort will still perform faster though.
Use the sorting calls available in the CUDPP or the Thrust library.
If you use cudppSort, note that it only works with integers or floats. To sort your array of structures, you can first sort the keys along with an index array. Later, you can use the sorted index array to move the structures to their final sorted location. I have described how to do this for the cudppCompact compaction algorithm in a blog post here. The steps are similar for sorting an array of structs using cudppSort.