Is there any way to directly use hardware accelerated ray triangle intersection in CUDA without using OptiX? This is analogous to how it is possible to use tensor cores dire