I have two different program that wish to hash same string using Murmur3 in Python and Java respectively.
Python version 2.7.9:
If anyone is interested in the reverse answer, converting the python output to the Java output:
import mmh3
import string
char_array = '0123456789abcdef'
mumrmur = mmh3.hash_bytes('abc')
result = [f'{string.hexdigits[(char >> 4) & 0xf]}{string.hexdigits[char & 0xf]}' for char in mumrmur]
print(''.join(result))
Here's how to get the same result from both:
byte[] mm3_le = Hashing.murmur3_128().hashString("abc", UTF_8).asBytes();
byte[] mm3_be = Bytes.toArray(Lists.reverse(Bytes.asList(mm3_le)));
assertEquals("79267961763742113019008347020647561319",
new BigInteger(mm3_be).toString());
The hash code's bytes need to be treated as little endian but BigInteger
interprets bytes as big endian. You were presumably using new BigInteger(hex, 16)
to create the BigInteger
, but the output of HashCode.toString()
is actually a series of pairs of hexadecimal digits representing the hash bytes in the same order they're returned by asBytes()
(little endian). (You can also reverse those pairs of hexadecimal to get a hex number that does produce the same result when passed to new BigInteger(reversedHex, 16)
).
I think the documentation of toString()
is somewhat confusing because of the way it refers to "big endian"; it doesn't actually mean that the output of the method is the hexadecimal number representing the bytes interpreted as big endian.
We have an open issue for adding asBigInteger()
to HashCode
.