Multiplying two binary numbers takes n^2 time, yet squaring a number can be done more efficiently somehow. (with n being the number of bits) How could that be?
Or is i
I think that you are completely wrong in your statements
Multiplying two binary numbers takes n^2 time
Multiplying two 32bit numbers take exactly one clock cycle. On a 64 bit processor, I would assume that multiplying two 64 bit numbers take exactly 1 clock cycle. It wouldn't even surprise my that a 32bit processor can multiply two 64bit numbers in 1 clock cycle.
yet squaring a number can be done more efficiently somehow.
Squaring a number is just multiplying the number with itself, so that is just a simple multiplication. There is no "square" operation in the CPU.
Maybe you are confusing "squaring" with "multiplying by a power of 2". Multiplying by 2 can be implemeted by shifting all the bits one position to the "left". Multiplying by 4 is shifting all the bits two positions to the "left". By 8, 3 positions. But this trick only applies to a power of two.
There exist algorithms more efficient than O(N^2) to multiply two numbers (see Karatsuba, Pollard, Schönhage–Strassen, etc.)
The two problems "multiply two arbitrary N-bit numbers" and "Square an arbitrary N-bit number" have the same complexity.
We have
4*x*y = (x+y)^2 - (x-y)^2
So if squaring N-bit integers takes O(f(N)) time, then the product of two arbitrary N-bit integers can be obtained in O(f(N)) too. (that is 2x N-bit sums, 2x N-bit squares, 1x 2N-bit sum, and 1x 2N-bit shift)
And obviously we have
x^2 = x * x
So if multiplying two N-bit integers takes O(f(N)), then squaring a N-bit integer can be done in O(f(N)).
Any algorithm computing the product (resp the square) provides an algorithm to compute the square (resp the product) with the same asymptotic cost.
As noted in other answers, the algorithms used for fast multiplication can be simplified in the case of squaring. The gain will be on the constant in front of the f(N), and not on f(N) itself.
Squaring an n digit number may be faster than multiplying two random n digit numbers. Googling I found this article. It is about arbitrary precision arithmetic but it may be relevant to what your asking. In it the authors say this:
In squaring a large integer, i.e. X^2 = (xn-1, xn-2, ... , x1, x0)^2 many cross-product terms of the form xi * xj and xj * xi are equivalent. They need to be computed only once and then left shifted in order to be doubled. An n-digit squaring operation is performed using only (n^2 + n)/2 single-precision multiplications.
I believe you may be referring to exponentiation by squaring . This technique isn't used for multiplying, but for raising to a power x^n, where n may be large. Rather than multiply x times itself N times, one performs a series of squaring and adding operations which can be mapped to the binary representation of N. The number of multiplication operations (which are more expensive than additions for large numbers) is reduced from N to log(N) with respect to the naive exponentiation algorithm.
I would want to solve the problem by N bit multiplication for a number
A the bits be A(n-1)A(n-2)........A(1)A(0).
B the bits be B(n-1)B(n-2)........B(1)B(0).
for the square of number A the unique multiplication bits generated will be for A(0)->A(0)....A(n-1) A(1)->A(1)....A(n-1) and so on so the total operations will be
OP = n + n-1 + n-2 ....... + 1 Therefore OP = n^2+n/2; so the Asymptotic notation will be O(n^2)
and for multiplication of A and B n^2 unique multiplications will be generated so the Asymptotic notation will be O(n^2)
Like others have pointed out, squaring can only be about 1.5X or 2X faster than regular multiplication between arbitrary numbers. Where does the computational advantage come from? It's symmetry. Let's calculate the square of 1011
and try to spot a pattern that we can exploit. u0:u3
represent the bits in the number from the most significant to the least significant.
1011 // u3 * u0 : u3 * u1 : u3 * u2 : u3 * u3
1011 // u2 * u0 : u2 * u1 : u2 * u2 : u2 * u3
0000 // u1 * u0 : u1 * u1 : u1 * u2 : u1 * u3
1011 // u0 * u0 : u0 * u1 : u0 * u2 : u0 * u3
If you consider the elements ui * ui
for i=0, 1, ..., 4
to form the diagonal and ignore them, you'll see that the elements ui * uj
for i ≠ j
are repeated twice.
Therefore, all you need to do is calculate the product sum for elements below the diagonal and double it, with a left shift. You'd finally add the diagonal elements. Now you can see where the 2X speed up comes from. In practice, the speed-up is about 1.5X because of the diagonal and extra operations.