I wanted to test the cuda implementation of xgels provided with CUDA 11.1, and it seems I cannot make it work properly. For instance, this code seems to run just fine: