问题
I am working on a particle code where flushing-to-zero is extensively used to extract performance. However there is a single floating point comparison statement that I do not wish to be flushed. One solution is to use inline PTX, but it introduces unnecessary instructions since there is no boolean type, but just predicate registers, in PTX: C++ code:
float a, b;
if ( a < b ) do_something;
// compiles into SASS:
// FSETP.LT.FTZ.AND P0, PT, A, B, PT;
// @P0 DO_SOMETHING
PTX:
float a, b;
uint p;
asm("{.reg .pred p; setp.lt.f32 p, %1, %2; selp %0, 1, 0, p;}" : "=r"(p) : "f"(a), "f"(b) );
if (p) do_something;
// compiled into SASS:
// FSETP.LT.AND P0, PT, A, B, PT;
// SEL R2, RZ, 0x1, !P0;
// ISETP.NE.AND P0, PT, R2, RZ, PT;
// @P0 DO_SOMETHING
Is there a way that I can do the non-FTZ comparison with a single instruction without coding the entire thing in PTX/SASS?
来源:https://stackoverflow.com/questions/29563307/how-to-prevent-ftz-for-a-single-line-in-cuda