Why does this code give the output C++Sucks
? What is the concept behind it?
#include
double m[] = {7709179928849219.0, 771};
i
More readable version:
double m[2] = {7709179928849219.0, 771};
// m[0] = 7709179928849219.0;
// m[1] = 771;
int main()
{
if (m[1]-- != 0)
{
m[0] *= 2;
main();
}
else
{
printf((char*) m);
}
}
It recursively calls main()
771 times.
In the beginning, m[0] = 7709179928849219.0
, which stands for C++Suc;C
. In every call, m[0]
gets doubled, to "repair" last two letters. In the last call, m[0]
contains ASCII char representation of C++Sucks
and m[1]
contains only zeros, so it has a null terminator for C++Sucks
string. All under assumption that m[0]
is stored on 8 bytes, so each char takes 1 byte.
Without recursion and illegal main()
calling it will look like this:
double m[] = {7709179928849219.0, 0};
for (int i = 0; i < 771; i++)
{
m[0] *= 2;
}
printf((char*) m);
Disclaimer: This answer was posted to the original form of the question, which mentioned only C++ and included a C++ header. The question's conversion to pure C was done by the community, without input from the original asker.
Formally speaking, it's impossible to reason about this program because it's ill-formed (i.e. it's not legal C++). It violates C++11[basic.start.main]p3:
The function main shall not be used within a program.
This aside, it relies on the fact that on a typical consumer computer, a double
is 8 bytes long, and uses a certain well-known internal representation. The initial values of the array are computed so that when the "algorithm" is performed, the final value of the first double
will be such that the internal representation (8 bytes) will be the ASCII codes of the 8 characters C++Sucks
. The second element in the array is then 0.0
, whose first byte is 0
in the internal representation, making this a valid C-style string. This is then sent to output using printf()
.
Running this on HW where some of the above doesn't hold would result in garbage text (or perhaps even an access out of bounds) instead.
First we should recall that double precision numbers are stored in the memory in binary format as follows:
(i) 1 bit for the sign
(ii) 11 bits for the exponent
(iii) 52 bits for the magnitude
The order of the bits decrease from (i) to (iii).
First the decimal fractional number is converted to equivalent fractional binary number and then it is expressed as order of magnitude form in binary.
So the number 7709179928849219.0 becomes
(11011011000110111010101010011001010110010101101000011)base 2
=1.1011011000110111010101010011001010110010101101000011 * 2^52
Now while considering the magnitude bits 1. is neglected as all the order of magnitude method shall start with 1.
So the magnitude part becomes :
1011011000110111010101010011001010110010101101000011
Now the power of 2 is 52 , we need to add biasing number to it as 2^(bits for exponent -1)-1 i.e. 2^(11 -1)-1 =1023 , so our exponent becomes 52 + 1023 = 1075
Now our code mutiplies the number with 2, 771 times which makes the exponent to increase by 771
So our exponent is (1075+771)= 1846 whose binary equivalent is (11100110110)
Now our number is positive so our sign bit is 0.
So our modified number becomes :
sign bit + exponent+ magnitude (simple concatenation of the bits)
0111001101101011011000110111010101010011001010110010101101000011
since m is converted to char pointer we shall split the bit pattern in chunks of 8 from the LSD
01110011 01101011 01100011 01110101 01010011 00101011 00101011 01000011
(whose Hex equivalent is :)
0x73 0x6B 0x63 0x75 0x53 0x2B 0x2B 0x43
Which from the character map as shown is :
s k c u S + + C
Now once this has been made m[1] is 0 which means a NULL character
Now assuming that you run this program on a little-endian machine( lower order bit is stored in lower address) so pointer m pointer to the lowest address bit and then proceeds by taking up bits in chucks of 8 ( as type casted to char* ) and the printf() stops when encounted 00000000 in the last chunck...
This code is however not portable.
It is just building up a double array (16 bytes) which - if interpreted as a char array - build up the ASCII codes for the string "C++Sucks"
However, the code is not working on each system, it relies on some of the following undefined facts:
Perhaps the easiest way to understand the code is to work through things in reverse. We'll start with a string to print out -- for balance, we'll use "C++Rocks". Crucial point: just like the original, it's exactly eight characters long. Since we're going to do (roughly) like the original, and print it out in reverse order, we'll start by putting it in in reverse order. For our first step, we'll just view that bit pattern as a double
, and print out the result:
#include <stdio.h>
char string[] = "skcoR++C";
int main(){
printf("%f\n", *(double*)string);
}
This produces 3823728713643449.5
. So, we want to manipulate that in some way that isn't obvious, but is easy to reverse. I'll semi-arbitrarily choose multiplication by 256, which gives us 978874550692723072
. Now, we just need to write some obfuscated code to divide by 256, then print out the individual bytes of that in reverse order:
#include <stdio.h>
double x [] = { 978874550692723072, 8 };
char *y = (char *)x;
int main(int argc, char **argv){
if (x[1]) {
x[0] /= 2;
main(--x[1], (char **)++y);
}
putchar(*--y);
}
Now we have lots of casting, passing arguments to (recursive) main
that are completely ignored (but evaluation to get the increment and decrement are utterly crucial), and of course that completely arbitrary looking number to cover up the fact that what we're doing is really pretty straightforward.
Of course, since the whole point is obfuscation, if we feel like it we can take more steps as well. Just for example, we can take advantage of short-circuit evaluation, to turn our if
statement into a single expression, so the body of main looks like this:
x[1] && (x[0] /= 2, main(--x[1], (char **)++y));
putchar(*--y);
To anybody who isn't accustomed to obfuscated code (and/or code golf) this starts to look pretty strange indeed -- computing and discarding the logical and
of some meaningless floating point number and the return value from main
, which isn't even returning a value. Worse, without realizing (and thinking about) how short-circuit evaluation works, it may not even be immediately obvious how it avoids infinite recursion.
Our next step would probably be to separate printing each character from finding that character. We can do that pretty easily by generating the right character as the return value from main
, and printing out what main
returns:
x[1] && (x[0] /= 2, putchar(main(--x[1], (char **)++y)));
return *--y;
At least to me, that seems obfuscated enough, so I'll leave it at that.
The number 7709179928849219.0
has the following binary representation as a 64-bit double
:
01000011 00111011 01100011 01110101 01010011 00101011 00101011 01000011
+^^^^^^^ ^^^^---- -------- -------- -------- -------- -------- --------
+
shows the position of the sign; ^
of the exponent, and -
of the mantissa (i.e. the value without the exponent).
Since the representation uses binary exponent and mantissa, doubling the number increments the exponent by one. Your program does it precisely 771 times, so the exponent which started at 1075 (decimal representation of 10000110011
) becomes 1075 + 771 = 1846 at the end; binary representation of 1846 is 11100110110
. The resultant pattern looks like this:
01110011 01101011 01100011 01110101 01010011 00101011 00101011 01000011
-------- -------- -------- -------- -------- -------- -------- --------
0x73 's' 0x6B 'k' 0x63 'c' 0x75 'u' 0x53 'S' 0x2B '+' 0x2B '+' 0x43 'C'
This pattern corresponds to the string that you see printed, only backwards. At the same time, the second element of the array becomes zero, providing null terminator, making the string suitable for passing to printf()
.