Edit: The code here still has some bugs in it, and it could do better in the performance department, but instead of trying to fix this, for t
Although the size of a uint32_t may be equivalent to that of a float on a given arch, by reinterpreting a cast from one into the other you are implicitly assuming that atomic increments, decrements and all the other operations on bits are semantically equivalent on both types, which are not in reality. I doubt it works as expected.
Just a note about this (I wanted to make a comment but apparently new users aren't allowed to comment): Using reinterpret_cast on references produces incorrect code with gcc 4.1 -O3. This seems to be fixed in 4.4 because there it works. Changing the reinterpret_casts to pointers, while slightly uglier, works in both cases.