CUDA: Why are bitwise operators sometimes faster than logical operators?
问题 When I am down to squeezing the last bit of performance out of a kernel, I usually find that replacing the logical operators ( && and || ) with bitwise operators ( & and | ) makes the kernel a little bit faster. This was observed by looking at the kernel time summary in CUDA Visual Profiler. So, why are bitwise operators faster than logical operators in CUDA? I must admit that they are not always faster, but a lot of times they are. I wonder what magic can give this speedup. Disclaimer: I am