I\'ve being benchmarking an algorithm, it\'s not necessary to know the details. The main components are a buffer(raw array of integers) and an indexer (integer - used to access
It depends heavily on the underlying architecture. Usually fastest data types are those that are word-wide. In my experience with IA32 (x86-32), smaller/bigger than word data types incur in penalties, sometimes even more than one memory read for one single data.
Once on the CPU registers, usually data type length doesn't matter (if the whole data fits in one register, that is) but what operations you accomplish with them. Of course floating point operations are the most costly; the fastest being adding, subtracting (which is also comparing), bit-wise (shift and the like), and logical operations (and, or...).