When understanding how primitive operators such as +
, -
, *
and /
are implemented in C, I found the following snippet from an
The code that you found tries to explain how very primitive computer hardware might implement an "add" instruction. I say "might" because I can guarantee that this method isn't used by any CPU, and I'll explain why.
In normal life, you use decimal numbers and you have learned how to add them: To add two numbers, you add the lowest two digits. If the result is less than 10, you write down the result and proceed to the next digit position. If the result is 10 or more, you write down the result minus 10, proceed to the next digit, buy you remember to add 1 more. For example: 23 + 37, you add 3+7 = 10, you write down 0 and remember to add 1 more for the next position. At the 10s position, you add (2+3) + 1 = 6 and write that down. Result is 60.
You can do the exact same thing with binary numbers. The difference is that the only digits are 0 and 1, so the only possible sums are 0, 1, 2. For a 32 bit number, you would handle one digit position after the other. And that is how really primitive computer hardware would do it.
This code works differently. You know the sum of two binary digits is 2 if both digits are 1. So if both digits are 1 then you would add 1 more at the next binary position and write down 0. That's what the calculation of t does: It finds all places where both binary digits are 1 (that's the &) and moves them to the next digit position (<< 1). Then it does the addition: 0+0 = 0, 0+1 = 1, 1+0 = 1, 1+1 is 2, but we write down 0. That's what the excludive or operator does.
But all the 1's that you had to handle in the next digit position haven't been handled. They still need to be added. That's why the code does a loop: In the next iteration, all the extra 1's are added.
Why does no processor do it that way? Because it's a loop, and processors don't like loops, and it is slow. It's slow, because in the worst case, 32 iterations are needed: If you add 1 to the number 0xffffffff (32 1-bits), then the first iteration clears bit 0 of y and sets x to 2. The second iteration clears bit 1 of y and sets x to 4. And so on. It takes 32 iterations to get the result. However, each iteration has to process all bits of x and y, which takes a lot of hardware.
A primitive processor would do things just as quick in the way you do decimal arithmetic, from the lowest position to the highest. It also takes 32 steps, but each step processes only two bits plus one value from the previous bit position, so it is much easier to implement. And even in a primitive computer, one can afford to do this without having to implement loops.
A modern, fast and complex CPU will use a "conditional sum adder". Especially if the number of bits is high, for example a 64 bit adder, it saves a lot of time.
A 64 bit adder consists of two parts: First, a 32 bit adder for the lowest 32 bit. That 32 bit adder produces a sum, and a "carry" (an indicator that a 1 must be added to the next bit position). Second, two 32 bit adders for the higher 32 bits: One adds x + y, the other adds x + y + 1. All three adders work in parallel. Then when the first adder has produced its carry, the CPU just picks which one of the two results x + y or x + y + 1 is the correct one, and you have the complete result. So a 64 bit adder only takes a tiny bit longer than a 32 bit adder, not twice as long.
The 32 bit adder parts are again implemented as conditional sum adders, using multiple 16 bit adders, and the 16 bit adders are conditional sum adders, and so on.