I want to calculate hash for strings in hive without writing any UDF only using exisiting functions . So that I can use similar approach to get consistent hash in other lang
It depends on the version of Hive, cf. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Misc.Functions
select XYZ, hash(XYZ) from ABC
has been available for years and applies plain old java.lang.String.hashCode()
, returning an INT (32 bit hash)
[Edit 2] Actually it's a bit more complex since hash()
accepts a list of arguments of any type (incl. primitive types that have no built-in hashing method), so a custom approach is used -- check ObjectInspectorUtils.hashCode()
and ObjectInspectorUtils.getBucketHashCode()
in the source code here (for V2.1)
select XYZ, crc32(XYZ) from ABC
requires Hive 1.3 and applies plain old Cyclic Redundancy Check (probably via java.util.zip.CRC32
), returning a BIGINT (32 bit hash)
select XYZ, md5(XYZ), sha1(XYZ), sha2(XYZ,256), sha2(XYZ,512) from ABC
requires Hive 1.3 and applies strong, cryptographic hash functions, returning a STRING with the hexadecimal representation of the binary (128, 160, 256 and 512 bit hashes)
[Edit 1] the answer to that post has also a very good workaround for applying crypto hash functions with older versions of Hive, using Apache Commons static methods and reflect()
.