问题
I have two different program that wish to hash same string using Murmur3 in Python and Java respectively.
Python version 2.7.9:
mmh3.hash128('abc')
Gives 79267961763742113019008347020647561319L.
Java is Guava 18.0:
HashCode hashCode = Hashing.murmur3_128().newHasher().putString("abc", StandardCharsets.UTF_8).hash();
Gives string "6778ad3f3f3f96b4522dca264174a23b", converting to BigInterger gives 137537073056680613988840834069010096699.
How to get same result from both?
Thanks
回答1:
Here's how to get the same result from both:
byte[] mm3_le = Hashing.murmur3_128().hashString("abc", UTF_8).asBytes();
byte[] mm3_be = Bytes.toArray(Lists.reverse(Bytes.asList(mm3_le)));
assertEquals("79267961763742113019008347020647561319",
new BigInteger(mm3_be).toString());
The hash code's bytes need to be treated as little endian but BigInteger
interprets bytes as big endian. You were presumably using new BigInteger(hex, 16)
to create the BigInteger
, but the output of HashCode.toString()
is actually a series of pairs of hexadecimal digits representing the hash bytes in the same order they're returned by asBytes()
(little endian). (You can also reverse those pairs of hexadecimal to get a hex number that does produce the same result when passed to new BigInteger(reversedHex, 16)
).
I think the documentation of toString()
is somewhat confusing because of the way it refers to "big endian"; it doesn't actually mean that the output of the method is the hexadecimal number representing the bytes interpreted as big endian.
We have an open issue for adding asBigInteger()
to HashCode
.
回答2:
If anyone is interested in the reverse answer, converting the python output to the Java output:
import mmh3
import string
char_array = '0123456789abcdef'
mumrmur = mmh3.hash_bytes('abc')
result = [f'{string.hexdigits[(char >> 4) & 0xf]}{string.hexdigits[char & 0xf]}' for char in mumrmur]
print(''.join(result))
来源:https://stackoverflow.com/questions/29932956/murmur3-hash-different-result-between-python-and-java-implementation