Fastest hash for non-cryptographic uses?

后端 未结 13 1014
挽巷
挽巷 2020-12-04 08:23

I\'m essentially preparing phrases to be put into the database, they may be malformed so I want to store a short hash of them instead (I will be simply comparing if they exi

相关标签:
13条回答
  • 2020-12-04 08:28

    Caveat

    The answer below does not answer the question as asked, since it does not recommend hash functions. Remember, "A hash function is any function that can be used to map data of arbitrary size to fixed-size values." (Wikipedia) The answer below recommends transformations that do not guarantee fixed-size results.

    If you are willing to relax the requirement of using a hash function, read on...

    Original Answer

    I suggest urlencode() or base64_encode() for these reasons:

    • You don't need cryptography
    • You want speed
    • You want a way to identify unique strings while cleaning up 'malformed' strings

    Adapting the benchmark code elsewhere in these replies, I've demonstrated that either of these are way faster than any hash algorithm. Depending on your application, you might be able to use urlencode() or base64_encode() to clean up any 'malformed' strings you want to store.

    0 讨论(0)
  • 2020-12-04 08:29

    If you're looking for fast and unique, I recommend xxHash or something that uses newer cpu's crc32c built-in command, see https://stackoverflow.com/a/11422479/32453. It also links there to possibly even faster hashes if you don't care about the possibility of collision as much.

    0 讨论(0)
  • 2020-12-04 08:33

    CRC32 is faster, but less secure than MD5 and SHA1. There is not that much speed difference between MD5 and SHA1.

    0 讨论(0)
  • 2020-12-04 08:35

    Ranked list where each loop shares the same thing to crypt as all the others.

    <?php
    
    set_time_limit(720);
    
    $begin = startTime();
    $scores = array();
    
    
    foreach(hash_algos() as $algo) {
        $scores[$algo] = 0;
    }
    
    for($i=0;$i<10000;$i++) {
        $number = rand()*100000000000000;
        $string = randomString(500);
    
        foreach(hash_algos() as $algo) {
            $start = startTime();
    
            hash($algo, $number); //Number
            hash($algo, $string); //String
    
            $end = endTime($start);
    
            $scores[$algo] += $end;
        }   
    }
    
    
    asort($scores);
    
    $i=1;
    foreach($scores as $alg => $time) {
        print $i.' - '.$alg.' '.$time.'<br />';
        $i++;
    }
    
    echo "Entire page took ".endTime($begin).' seconds<br />';
    
    echo "<br /><br /><h2>Hashes Compared</h2>";
    
    foreach($scores as $alg => $time) {
        print $i.' - '.$alg.' '.hash($alg,$string).'<br />';
        $i++;
    }
    
    function startTime() {
       $mtime = microtime(); 
       $mtime = explode(" ",$mtime); 
       $mtime = $mtime[1] + $mtime[0]; 
       return $mtime;   
    }
    
    function endTime($starttime) {
       $mtime = microtime(); 
       $mtime = explode(" ",$mtime); 
       $mtime = $mtime[1] + $mtime[0]; 
       $endtime = $mtime; 
       return $totaltime = ($endtime - $starttime); 
    }
    
    function randomString($length) {
        $characters = '0123456789abcdefghijklmnopqrstuvwxyz';
        $string = '';    
        for ($p = 0; $p < $length; $p++) {
            $string .= $characters[mt_rand(0, strlen($characters) - 1)];
        }
        return $string;
    }
    
    ?>
    

    And the output

    1 - crc32b 0.111036300659
    2 - crc32 0.112048864365
    3 - md4 0.120795726776
    4 - md5 0.138875722885
    5 - sha1 0.146368741989
    6 - adler32 0.15501332283
    7 - tiger192,3 0.177447080612
    8 - tiger160,3 0.179498195648
    9 - tiger128,3 0.184012889862
    10 - ripemd128 0.184052705765
    11 - ripemd256 0.185411214828
    12 - salsa20 0.198500156403
    13 - salsa10 0.204956293106
    14 - haval160,3 0.206098556519
    15 - haval256,3 0.206891775131
    16 - haval224,3 0.206954240799
    17 - ripemd160 0.207638263702
    18 - tiger192,4 0.208125829697
    19 - tiger160,4 0.208438634872
    20 - tiger128,4 0.209359407425
    21 - haval128,3 0.210256814957
    22 - sha256 0.212738037109
    23 - ripemd320 0.215386390686
    24 - haval192,3 0.215610980988
    25 - sha224 0.218329429626
    26 - haval192,4 0.256464719772
    27 - haval160,4 0.256565093994
    28 - haval128,4 0.257113456726
    29 - haval224,4 0.258928537369
    30 - haval256,4 0.259262084961
    31 - haval192,5 0.288433790207
    32 - haval160,5 0.290239810944
    33 - haval256,5 0.291721343994
    34 - haval224,5 0.294484138489
    35 - haval128,5 0.300224781036
    36 - sha384 0.352449893951
    37 - sha512 0.354603528976
    38 - gost 0.392376661301
    39 - whirlpool 0.629067659378
    40 - snefru256 0.829529047012
    41 - snefru 0.833986997604
    42 - md2 1.80192279816
    Entire page took 22.755341053 seconds
    
    
    Hashes Compared
    
    1 - crc32b 761331d7
    2 - crc32 7e8c6d34
    3 - md4 1bc8785de173e77ef28a24bd525beb68
    4 - md5 9f9cfa3b5b339773b8d6dd77bbe931dd
    5 - sha1 ca2bd798e47eab85655f0ce03fa46b2e6e20a31f
    6 - adler32 f5f2aefc
    7 - tiger192,3 d11b7615af06779259b29446948389c31d896dee25edfc50
    8 - tiger160,3 d11b7615af06779259b29446948389c31d896dee
    9 - tiger128,3 d11b7615af06779259b29446948389c3
    10 - ripemd128 5f221a4574a072bc71518d150ae907c8
    11 - ripemd256 bc89cd79f4e70b73fbb4faaf47a3caf263baa07e72dd435a0f62afe840f5c71c
    12 - salsa20 91d9b963e172988a8fc2c5ff1a8d67073b2c5a09573cb03e901615dc1ea5162640f607e0d7134c981eedb761934cd8200fe90642a4608eacb82143e6e7b822c4
    13 - salsa10 320b8cb8498d590ca2ec552008f1e55486116257a1e933d10d35c85a967f4a89c52158f755f775cd0b147ec64cde8934bae1e13bea81b8a4a55ac2c08efff4ce
    14 - haval160,3 27ad6dd290161b883e614015b574b109233c7c0e
    15 - haval256,3 03706dd2be7b1888bf9f3b151145b009859a720e3fe921a575e11be801c54c9a
    16 - haval224,3 16706dd2c77b1888c29f3b151745b009879a720e4fe921a576e11be8
    17 - ripemd160 f419c7c997a10aaf2d83a5fa03c58350d9f9d2e4
    18 - tiger192,4 112f486d3a9000f822c050a204d284d52473f267b1247dbd
    19 - tiger160,4 112f486d3a9000f822c050a204d284d52473f267
    20 - tiger128,4 112f486d3a9000f822c050a204d284d5
    21 - haval128,3 9d9155d430218e4dcdde1c62962ecca3
    22 - sha256 6027f87b4dd4c732758aa52049257f9e9db7244f78c132d36d47f9033b5c3b09
    23 - ripemd320 9ac00db553b51662826267daced37abfccca6433844f67d8f8cfd243cf78bbbf86839daf0961b61d
    24 - haval192,3 7d706dd2d37c1888eaa53b154948b009e09c720effed21a5
    25 - sha224 b6395266d8c7e40edde77969359e6a5d725f322e2ea4bd73d3d25768
    26 - haval192,4 d87cd76e4c8006d401d7068dce5dec3d02dfa037d196ea14
    27 - haval160,4 f2ffffd76e156d0cd40eec0b8d09c8f23d0f47a437
    28 - haval128,4 f066e6312b91e7ef69f26b2adbeba875
    29 - haval224,4 1b7cd76ea97c06d439d6068d7d56ec3d73dba0373895ea14e465bc0e
    30 - haval256,4 157cd76e8b7c06d432d6068d7556ec3d66dba0371c95ea14e165bc0ec31b9d37
    31 - haval192,5 05f9ea219ae1b98ba33bac6b37ccfe2f248511046c80c2f0
    32 - haval160,5 e054ec218637bc8b4bf1b26b2fb40230e0161904
    33 - haval256,5 48f6ea210ee1b98be835ac6b7dc4fe2f39841104a37cc2f06ceb2bf58ab4fe78
    34 - haval224,5 57f6ea2111e1b98bf735ac6b92c4fe2f43841104ab7cc2f076eb2bf5
    35 - haval128,5 ccb8e0ac1fd12640ecd8976ab6402aa8
    36 - sha384 bcf0eeaa1479bf6bef7ece0f5d7111c3aeee177aa7990926c633891464534cd8a6c69d905c36e882b3350ef40816ed02
    37 - sha512 8def9a1e6e31423ef73c94251d7553f6fe3ed262c44e852bdb43e3e2a2b76254b4da5ef25aefb32aae260bb386cd133045adfa2024b067c2990b60d6f014e039
    38 - gost ef6cb990b754b1d6a428f6bb5c113ee22cc9533558d203161441933d86e3b6f8
    39 - whirlpool 54eb1d0667b6fdf97c01e005ac1febfacf8704da55c70f10f812b34cd9d45528b60d20f08765ced0ab3086d2bde312259aebf15d105318ae76995c4cf9a1e981
    40 - snefru256 20849cbeda5ddec5043c09d36b2de4ba0ea9296b6c9efaa7c7257f30f351aea4
    41 - snefru 20849cbeda5ddec5043c09d36b2de4ba0ea9296b6c9efaa7c7257f30f351aea4
    42 - md2 d4864c8c95786480d1cf821f690753dc
    
    0 讨论(0)
  • 2020-12-04 08:36

    There's a speed comparison on xxhash site. Copy pasting it here:

     Name            Speed       Q.Score   Author
     xxHash          5.4 GB/s     10
     MumurHash 3a    2.7 GB/s     10       Austin Appleby
     SpookyHash      2.0 GB/s     10       Bob Jenkins
     SBox            1.4 GB/s      9       Bret Mulvey
     Lookup3         1.2 GB/s      9       Bob Jenkins
     CityHash64      1.05 GB/s    10       Pike & Alakuijala
     FNV             0.55 GB/s     5       Fowler, Noll, Vo
     CRC32           0.43 GB/s     9
     MD5-32          0.33 GB/s    10       Ronald L. Rivest
     SHA1-32         0.28 GB/s    10
    

    So it seems xxHash is by far the fastest one, while many others beat older hashes, like CRC32, MD5 and SHA.

    https://code.google.com/p/xxhash/

    Note that this is the ordering on a 32-bit compilation. On a 64-bit compilation the performance order is likely very different. Some of the hashes are heavily based on 64-bit multiplications and fetches.

    0 讨论(0)
  • 2020-12-04 08:39
    +-------------------+---------+------+--------------+
    |       NAME        |  LOOPS  | TIME |     OP/S     |
    +-------------------+---------+------+--------------+
    | sha1ShortString   | 1638400 | 2.85 | 574,877.19   |
    | md5ShortString    | 2777680 | 4.11 | 675,834.55   |
    | crc32ShortString  | 3847980 | 3.61 | 1,065,922.44 |
    | sha1MediumString  | 602620  | 4.75 | 126,867.37   |
    | md5MediumString   | 884860  | 4.69 | 188,669.51   |
    | crc32MediumString | 819200  | 4.85 | 168,907.22   |
    | sha1LongString    | 181800  | 4.95 | 36,727.27    |
    | md5LongString     | 281680  | 4.93 | 57,135.90    |
    | crc32LongString   | 226220  | 4.95 | 45,701.01    |
    +-------------------+---------+------+--------------+
    

    It seems that crc32 is faster for small messages(in this case 26 characters) while md5 for longer messages(in this case >852 characters).

    0 讨论(0)
提交回复
热议问题