Hive hash function resulting in 0,null and 1, why?

社会主义新天地 提交于 2019-12-25 15:44:29

问题


I am using hive 0.13.1 and hashing combination of keys using default hive hash function.

Something like select hash (date,token1,token2, parameters["a"],parameters["b"], parameters["c"]) from table1;

I ran it on 150M rows. For 60% of the rows, it hashed it correctly. For the remaining rows, it gave 0. null or 1 as hash. I looked at the rows which resulted in bad hashes, I don't see anything wrong with the rows. What could be causing it?


回答1:


The hash function returns 0 only when all supplied arguments are blank or null.

If you are familiar with Java then you may check implementation of hash function.

The hash function internally uses ObjectInspectorUtils.hashCode to get the hashCode for the supplied fields, use below java code snippet to test manually this issue:

import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.Text;
public class TestHash 
{
    public static void main( String[] args )
    {
        System.out.println( ObjectInspectorUtils.hashCode(null,PrimitiveObjectInspectorFactory.javaStringObjectInspector) );
        System.out.println( ObjectInspectorUtils.hashCode(new Text(""),PrimitiveObjectInspectorFactory.javaStringObjectInspector) );
    }
}

Maven dependencies required to run above program:

<dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>2.1.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.2</version>
        </dependency>


来源:https://stackoverflow.com/questions/38617437/hive-hash-function-resulting-in-0-null-and-1-why

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!