I recently wrote a Vector 3 class, and I submitted my normalize() function for reviewal to a friend. He said it was good, but that I should multiply by the reciprocal where poss
The CPU operation for (float) division is much more complicated than multiplication. The CPU has to do more. I am far from knowledgeable about hardware, but you can find a lot of info about common division implementation (based on newton-raphson algorithms, for example).
I would also be careful about always using multiplication of the reciprocal instead of division to gain CPU performance: they may not give exactly the same results. This may or may not matter depending on your application.