Why do seemingly empty files and strings produce md5sums?

后端 未结 3 1953
伪装坚强ぢ
伪装坚强ぢ 2021-01-31 07:56

Consider the following:

% md5sum /dev/null
d41d8cd98f00b204e9800998ecf8427e  /dev/null
% touch empty; md5sum empty
d41d8cd98f00b204e9800998ecf8427e  empty
% echo         


        
相关标签:
3条回答
  • 2021-01-31 08:06

    Why do seemingly empty files and strings produce md5sums?

    Because the "sum" in the md5sum is somewhat misleading. It's not like e.g. CRC32 checksum, that is zero for the empty file.

    MD5 is one of message digest algorithms. You can imagine it as a box that produces fixed-length random-looking value (hash) depending on its internal state. You change the internal state by feeding in the data.

    And that box internal state is predefined, such that that it yields randomly looking hash value even before any data is fed in. For MD5, it happens to be d41d8cd98f00b204e9800998ecf8427e.

    0 讨论(0)
  • 2021-01-31 08:13

    No need for surprise. The first two produce true empty inputs to md5sum. The echo produces a newline (echo -n '' should produce an empty output; I don't have a linux machine here to check). The perl produces a single zero byte (not to be confused with C where a zero byte marks end of string). The last command is looking for a file with the empty string as its file name.

    0 讨论(0)
  • 2021-01-31 08:17

    The md5sum of "nothing" (a zero-length stream of characters) is d41d8cd98f00b204e9800998ecf8427e, which you're seeing in your first two examples.

    The third and fourth examples are processing a single character. In the "echo" case, it's a newline, i.e.

    $ echo -ne '\n' | md5sum
    68b329da9893e34099c7d8ad5cb9c940 -
    

    In the perl example, it's a single byte with value 0x00, i.e.

    $ echo -ne '\x00' | md5sum
    93b885adfe0da089cdf634904fd59f71 -
    

    You can reproduce the empty checksum using "echo" as follows:

    $ echo -n '' | md5sum
    d41d8cd98f00b204e9800998ecf8427e -
    

    ...and using Perl as follows:

    $ perl -e 'print ""' | md5sum
    d41d8cd98f00b204e9800998ecf8427e  -
    

    In all four cases, you should expect the same output from checksumming the same data, but different data should produce a wildly different checksum (that's the whole point -- even if it's only a single character that differs.)

    0 讨论(0)
提交回复
热议问题