You can find a complete answer in this article:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
This is a quote from a previous Stack Overflow thread, about how float
and double
variables affect memory bandwidth:
If a double requires
more storage than a float, then it
will take longer to read the data.
That's the naive answer. On a modern
IA32, it all depends on where the data
is coming from. If it's in L1 cache,
the load is negligible provided the
data comes from a single cache line.
If it spans more than one cache line
there's a small overhead. If it's from
L2, it takes a while longer, if it's
in RAM then it's longer still and
finally, if it's on disk it's a huge
time. So the choice of float or double
is less imporant than the way the data
is used. If you want to do a small
calculation on lots of sequential
data, a small data type is preferable.
Doing a lot of computation on a small
data set would allow you to use bigger
data types with any significant
effect. If you're accessing the data
very randomly, then the choice of data
size is unimportant - data is loaded
in pages / cache lines. So even if you
only want a byte from RAM, you could
get 32 bytes transfered (this is very
dependant on the architecture of the
system). On top of all of this, the
CPU/FPU could be super-scalar (aka
pipelined). So, even though a load may
take several cycles, the CPU/FPU could
be busy doing something else (a
multiply for instance) that hides the
load time to a degree