Calculate hash without using exisiting hash fuction in Hive

前端 未结 1 1940
小蘑菇
小蘑菇 2020-12-22 01:02

I want to calculate hash for strings in hive without writing any UDF only using exisiting functions . So that I can use similar approach to get consistent hash in other lang

相关标签:
1条回答
  • 2020-12-22 01:54

    It depends on the version of Hive, cf. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Misc.Functions

    select XYZ, hash(XYZ) from ABC
    has been available for years and applies plain old java.lang.String.hashCode(), returning an INT (32 bit hash)

    [Edit 2] Actually it's a bit more complex since hash() accepts a list of arguments of any type (incl. primitive types that have no built-in hashing method), so a custom approach is used -- check ObjectInspectorUtils.hashCode() and ObjectInspectorUtils.getBucketHashCode() in the source code here (for V2.1)

    select XYZ, crc32(XYZ) from ABC
    requires Hive 1.3 and applies plain old Cyclic Redundancy Check (probably via java.util.zip.CRC32), returning a BIGINT (32 bit hash)

    select XYZ, md5(XYZ), sha1(XYZ), sha2(XYZ,256), sha2(XYZ,512) from ABC
    requires Hive 1.3 and applies strong, cryptographic hash functions, returning a STRING with the hexadecimal representation of the binary (128, 160, 256 and 512 bit hashes)


    [Edit 1] the answer to that post has also a very good workaround for applying crypto hash functions with older versions of Hive, using Apache Commons static methods and reflect().

    0 讨论(0)
提交回复
热议问题