I have an array of struct and I need to sort this array according to a property of the struct (N). The object looks like this:
struct OBJ
{
int N; //sort a
Why exactly are you heading towards CUDA? I mean, it smells like your problem is not one of those, CUDA is very good at. You just want to sort an array of 512 Elements and let some pointers refer to another location. This is nothing fancy, use a simple serial algorithm for that, e.g. Quicksort, Heapsort or Mergesort.
Additionally, think about the overhead it takes to copy data from your Heap/Stack to your CUDA device. Using CUDA just makes sense, when the calculations are intense enough so that COMPUTING_TIME_ON_CUDA+COPY_DATA_FROM_HEAP_TO_CUDA_DEVICE+COPY_DATA_FROM_CUDA_DEVICE_TO_HEAP < COMPUTING_TIME_ON_HOST_CPU
.
Besides, CUDA is immersely powerful at math calculations with big vectors and matrices and rather simple data-types (numbers) because it is one of the problems that often arise on a GPU: Calculating graphics.