I recently wrote a Vector 3 class, and I submitted my normalize() function for reviewal to a friend. He said it was good, but that I should multiply by the reciprocal where poss
If you think back to grade school, you'll recall that multiplication was harder than addition and division was harder than multiplication. Things aren't any different for the CPU.
Recall also that calculating the reciprocal involves a division, so unless you calculate the reciprocal once and use it three times, you won't see a speed up.