The C standard is quite unclear about the uint_fast*_t
family of types. On a gcc-4.4.4 linux x86_64 system, the types uint_fast16_t
and uint_
Actual performance at runtime is a very very complicated topic. With many factors ranging from Ram memory, hard-disks, OS'es; And the many processor specific quirks. But this will give you a rough run down for you:
N_fastX_t
N_leastX_t
The Multiplication problem?
Also to answer why the larger fastX variable would be slower in multiplication. Is cause due to the very nature of multiplication. (being similar to what you were thought in school)
http://en.wikipedia.org/wiki/Binary_multiplier
//Assuming 4bit int
0011 (3 in decimal)
x 0101 (5 in decimal)
======
0011 ("0011 x 0001")
0000- ("0011 x 0000")
0011-- ("0011 x 0001")
0000--- ("0011 x 0000")
=======
1111 (15 in decimal)
However it is important to know that a computer is a "logical idiot". While its obvious to us humans to skip the trailing zeros step. The computer will still work it out (its cheaper then checking conditionally then working it out anyway). Hence this creates a quirk for a larger size variable of the same value
//Assuming 8bit int
0000 0011 (3 in decimal)
x 0000 0101 (5 in decimal)
===========
0000 0011 ("0011 x 0001")
0 0000 000- ("0011 x 0000")
00 0000 11-- ("0011 x 0001")
000 0000 0--- ("0011 x 0000")
0000 0000 ---- (And the remainders of zeros)
-------------- (Will all be worked out)
==============
0000 1111 (15 in decimal)
While i did not spam out the remainder 0x0 additions in the multiplication process. It is important to note that the computer will "get them done". And hence it is natural that a larger variable multiplication will take longer time then its smaller counterpart. (Hence its always good to avoid multiplication and divisions whenever possible).
However here comes the 2nd quirk. It may not apply to all processors. It is important to note that all CPU operations are counted in CPU cycles. In which in each cycle dozens (or more) of such small additions operations is performed as seen above. As a result, a 8bit addition may take the same amount of time as an 8bit multiplication, and etc. Due to the various optimizations and CPU specific quirks.
If it concerns you that much. Go refer to Intel : http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
Additional mention about CPU vs RAM
As CPU have advance to moore's law to be several times faster then your DDR3 RAM.
This can result to situations where more time is spent looking up the variable from the ram then to CPU "compute" it. This is most prominent in long pointer chains.
So while a CPU cache exists on most processor to reduce "RAM look-up" time. Its uses is limited to specific cases (where cache line benefits the most). And for cases when it does not fit. Note that the RAM look-up time > CPU processing time (excluding multiplication/divisions/some quirks)