How does this print “hello world”?

后端 未结 9 1329
Happy的楠姐
Happy的楠姐 2021-01-29 17:33

I discovered this oddity:

for (long l = 4946144450195624l; l > 0; l >>= 5)
    System.out.print((char) (((l & 31 | 64) % 95) + 32));
相关标签:
9条回答
  • 2021-01-29 17:55

    The number 4946144450195624 fits 64 bits, its binary representation is:

     10001100100100111110111111110111101100011000010101000
    

    The program decodes a character for every 5-bits group, from right to left

     00100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000
       d  |  l  |  r  |  o  |  w  |     |  o  |  l  |  l  |  e  |  h
    

    5-bit codification

    For 5 bits, it is posible to represent 2⁵ = 32 characters. English alphabet contains 26 letters, this leaves room for 32 - 26 = 6 symbols apart from letters. With this codification scheme you can have all 26 (one case) english letters and 6 symbols (being space among them).

    Algorithm description

    The >>= 5 in the for-loop jumps from group to group, then the 5-bits group gets isolated ANDing the number with the mask 31₁₀ = 11111₂ in the sentence l & 31

    Now the code maps the 5-bit value to its corresponding 7-bit ascii character. This is the tricky part, check the binary representations for the lowercase alphabet letters in the following table:

      ascii   |     ascii     |    ascii     |    algorithm
    character | decimal value | binary value | 5-bit codification 
    --------------------------------------------------------------
      space   |       32      |   0100000    |      11111
        a     |       97      |   1100001    |      00001
        b     |       98      |   1100010    |      00010
        c     |       99      |   1100011    |      00011
        d     |      100      |   1100100    |      00100
        e     |      101      |   1100101    |      00101
        f     |      102      |   1100110    |      00110
        g     |      103      |   1100111    |      00111
        h     |      104      |   1101000    |      01000
        i     |      105      |   1101001    |      01001
        j     |      106      |   1101010    |      01010
        k     |      107      |   1101011    |      01011
        l     |      108      |   1101100    |      01100
        m     |      109      |   1101101    |      01101
        n     |      110      |   1101110    |      01110
        o     |      111      |   1101111    |      01111
        p     |      112      |   1110000    |      10000
        q     |      113      |   1110001    |      10001
        r     |      114      |   1110010    |      10010
        s     |      115      |   1110011    |      10011
        t     |      116      |   1110100    |      10100
        u     |      117      |   1110101    |      10101
        v     |      118      |   1110110    |      10110
        w     |      119      |   1110111    |      10111
        x     |      120      |   1111000    |      11000
        y     |      121      |   1111001    |      11001
        z     |      122      |   1111010    |      11010
    

    Here you can see that the ascii characters we want to map begin with the 7th and 6th bit set (11xxxxx₂) (except for space, which only has the 6th bit on), you could OR the 5-bit codification with 96 (96₁₀ = 1100000₂) and that should be enough to do the mapping, but that wouldn't work for space (darn space!)

    Now we know that special care has to be taken to process space at the same time as the other characters. To achieve this, the code turns the 7th bit on (but not the 6th) on the extracted 5-bit group with an OR 64 64₁₀ = 1000000₂ (l & 31 | 64).

    So far the 5-bit group is of the form: 10xxxxx₂ (space would be 1011111₂ = 95₁₀). If we can map space to 0 unaffecting other values, then we can turn the 6th bit on and that should be all. Here is what the mod 95 part comes to play, space is 1011111₂ = 95₁₀, using the mod operation (l & 31 | 64) % 95) only space goes back to 0, and after this, the code turns the 6th bit on by adding 32₁₀ = 100000₂ to the previous result, ((l & 31 | 64) % 95) + 32) transforming the 5-bit value into a valid ascii character

    isolates 5 bits --+          +---- takes 'space' (and only 'space') back to 0
                      |          |
                      v          v
                   (l & 31 | 64) % 95) + 32
                           ^           ^ 
           turns the       |           |
          7th bit on ------+           +--- turns the 6th bit on
    

    The following code does the inverse process, given a lowercase string (max 12 chars), returns the 64 bit long value that could be used with the OP's code:

    public class D {
        public static void main(String... args) {
            String v = "hello test";
            int len = Math.min(12, v.length());
            long res = 0L;
            for (int i = 0; i < len; i++) {
                long c = (long) v.charAt(i) & 31;
                res |= ((((31 - c) / 31) * 31) | c) << 5 * i;
            }
            System.out.println(res);
        }
    }    
    
    0 讨论(0)
  • 2021-01-29 18:03

    You are getting a result which happens to be char representation of below values

    104 -> h
    101 -> e
    108 -> l
    108 -> l
    111 -> o
    32  -> (space)
    119 -> w
    111 -> o
    114 -> r
    108 -> l
    100 -> d
    
    0 讨论(0)
  • 2021-01-29 18:04

    Without an Oracle tag, it was difficult to see this question. Active bounty brought me here. I wish the question had other relevant technology tags too :-(

    I mostly work with Oracle database, so I would use some Oracle knowledge to interpret and explain :-)

    Let's convert the number 4946144450195624 into binary. For that I use a small function called dec2bin i.e. decimal-to-binary.

    SQL> CREATE OR REPLACE FUNCTION dec2bin (N in number) RETURN varchar2 IS
      2    binval varchar2(64);
      3    N2     number := N;
      4  BEGIN
      5    while ( N2 > 0 ) loop
      6       binval := mod(N2, 2) || binval;
      7       N2 := trunc( N2 / 2 );
      8    end loop;
      9    return binval;
     10  END dec2bin;
     11  /
    
    Function created.
    
    SQL> show errors
    No errors.
    SQL>
    

    Let's use the function to get the binary value -

    SQL> SELECT dec2bin(4946144450195624) FROM dual;
    
    DEC2BIN(4946144450195624)
    --------------------------------------------------------------------------------
    10001100100100111110111111110111101100011000010101000
    
    SQL>
    

    Now the catch is the 5-bit conversion. Start grouping from right to left with 5 digits in each group. We get :-

    100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000
    

    We would be finally left with just 3 digits int he end at the right. Because, we had total 53 digits in the binary conversion.

    SQL> SELECT LENGTH(dec2bin(4946144450195624)) FROM dual;
    
    LENGTH(DEC2BIN(4946144450195624))
    ---------------------------------
                                   53
    
    SQL>
    

    hello world total has 11 characters(including space), so we need to add 2 bits to the last group where we were left with just 3 bits after grouping.

    So, now we have :-

    00100|01100|10010|01111|10111|11111|01111|01100|01100|00101|01000
    

    Now, we need to convert it to 7-bit ascii value. For the characters it is easy, we need to just set the 6th and 7th bit. Add 11 to each 5-bit group above to the left.

    That gives :-

    1100100|1101100|1110010|1101111|1110111|1111111|1101111|1101100|1101100|1100101|1101000
    

    Let's interpret the binary values, I will use binary to decimal conversion function.

    SQL> CREATE OR REPLACE FUNCTION bin2dec (binval in char) RETURN number IS
      2    i                 number;
      3    digits            number;
      4    result            number := 0;
      5    current_digit     char(1);
      6    current_digit_dec number;
      7  BEGIN
      8    digits := length(binval);
      9    for i in 1..digits loop
     10       current_digit := SUBSTR(binval, i, 1);
     11       current_digit_dec := to_number(current_digit);
     12       result := (result * 2) + current_digit_dec;
     13    end loop;
     14    return result;
     15  END bin2dec;
     16  /
    
    Function created.
    
    SQL> show errors;
    No errors.
    SQL>
    

    Let's look at each binary value -

    SQL> set linesize 1000
    SQL>
    SQL> SELECT bin2dec('1100100') val,
      2    bin2dec('1101100') val,
      3    bin2dec('1110010') val,
      4    bin2dec('1101111') val,
      5    bin2dec('1110111') val,
      6    bin2dec('1111111') val,
      7    bin2dec('1101111') val,
      8    bin2dec('1101100') val,
      9    bin2dec('1101100') val,
     10    bin2dec('1100101') val,
     11    bin2dec('1101000') val
     12  FROM dual;
    
           VAL        VAL        VAL        VAL        VAL        VAL        VAL        VAL        VAL     VAL           VAL
    ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
           100        108        114        111        119        127        111        108        108     101           104
    
    SQL>
    

    Let's look at what characters they are :-

    SQL> SELECT chr(bin2dec('1100100')) character,
      2    chr(bin2dec('1101100')) character,
      3    chr(bin2dec('1110010')) character,
      4    chr(bin2dec('1101111')) character,
      5    chr(bin2dec('1110111')) character,
      6    chr(bin2dec('1111111')) character,
      7    chr(bin2dec('1101111')) character,
      8    chr(bin2dec('1101100')) character,
      9    chr(bin2dec('1101100')) character,
     10    chr(bin2dec('1100101')) character,
     11    chr(bin2dec('1101000')) character
     12  FROM dual;
    
    CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER CHARACTER
    --------- --------- --------- --------- --------- --------- --------- --------- --------- --------- ---------
    d         l         r         o         w         ⌂         o         l         l         e         h
    
    SQL>
    

    So, what do we get in the output?

    d l r o w ⌂ o l l e h

    That is hello⌂world in reverse. The only issue is the space. And the reason is well explained by @higuaro in his answer. I honestly couldn't interpret the space issue myself at first attempt, until I saw the explanation given in his answer.

    0 讨论(0)
  • 2021-01-29 18:09

    Adding some value to above answers. Following groovy script prints intermediate values.

    String getBits(long l) {
    return Long.toBinaryString(l).padLeft(8,'0');
    }
    
    for (long l = 4946144450195624l; l > 0; l >>= 5){
        println ''
        print String.valueOf(l).toString().padLeft(16,'0')
        print '|'+ getBits((l & 31 ))
        print '|'+ getBits(((l & 31 | 64)))
        print '|'+ getBits(((l & 31 | 64)  % 95))
        print '|'+ getBits(((l & 31 | 64)  % 95 + 32))
    
        print '|';
        System.out.print((char) (((l & 31 | 64) % 95) + 32));
    }
    

    Here it is

    4946144450195624|00001000|01001000|01001000|01101000|h
    0154567014068613|00000101|01000101|01000101|01100101|e
    0004830219189644|00001100|01001100|01001100|01101100|l
    0000150944349676|00001100|01001100|01001100|01101100|l
    0000004717010927|00001111|01001111|01001111|01101111|o
    0000000147406591|00011111|01011111|00000000|00100000| 
    0000000004606455|00010111|01010111|01010111|01110111|w
    0000000000143951|00001111|01001111|01001111|01101111|o
    0000000000004498|00010010|01010010|01010010|01110010|r
    0000000000000140|00001100|01001100|01001100|01101100|l
    0000000000000004|00000100|01000100|01000100|01100100|d
    
    0 讨论(0)
  • 2021-01-29 18:10

    I found the code slightly easier to understand when translated into PHP, as follows:

    <?php
    
    $result=0;
    $bignum = 4946144450195624;
    for (; $bignum > 0; $bignum >>= 5){
        $result = (( $bignum & 31 | 64) % 95) + 32;
        echo chr($result);
    }
    

    See live code

    0 讨论(0)
  • 2021-01-29 18:16

    You've encoded characters as 5-bit values and packed 11 of them into a 64 bit long.

    (packedValues >> 5*i) & 31 is the i-th encoded value with a range 0-31.

    The hard part, as you say, is encoding the space. The lower case english letters occupy the contiguous range 97-122 in Unicode (and ascii, and most other encodings), but the space is 32.

    To overcome this, you used some arithmetic. ((x+64)%95)+32 is almost the same as x + 96 (note how bitwise OR is equivalent to addition, in this case), but when x=31, we get 32.

    0 讨论(0)
提交回复
热议问题