When reading through CUDA 5.0 Programming Guide I stumbled on a feature called \"Funnel shift\" which is present in 3.5 compute-capable device, but not 3.0. It contains an annot
In the case of CUDA, two 32-bit registers are concatenated together into a 64-bit value; that value is shifted left or right; and the most significant (for a left shift) or least significant (for right shift) 32 bits are returned.
The intrinsics from sm_35_intrinsics.h
are as follows:
unsigned int __funnelshift_lc(unsigned int lo, unsigned int hi, unsigned int shift);
unsigned int __funnelshift_rc(unsigned int lo, unsigned int hi, unsigned int shift);
According to Andy Glew (dead link removed), applications for funnel shift include fast misaligned memcpy; and as njuffa mentions in the comments above, it can be used to implement rotate if the two input words are the same.