Using Hive UDF in Impala gives erroneous results in Impala 1.2.4

生来就可爱ヽ(ⅴ<●) 提交于 2020-01-06 14:54:05

问题


I have two Hive UDFs in Java which work perfectly well in Hive.

Both functions are complimentary to each other.

String myUDF(BigInt)
BigInt myUDFReverso(String)

myUDF("myInput") gives some output which when myUDFReverso(myUDF("myInput")) should give back myInput

This works in Hive but when I try to use it in Impala (version 1.2.4) it gives expected answer for myUDF(BigInt) (the answer printed is correct) but the answer when passed to myUDFReverso(String) doesn't give back original answer).

I have noticed that length(myUDF("myInput")) in Impala 1.2.4 is wrong. It is +1 for every row. And again it is correct in case of Hive and also Impala (version 2.1)

So, I assume there is some extra(special) character being appended at the end of the output of myUDF in Impala 1.2.4 (Precisely at the end of the Text datatype returned from the UDF function).

I have built a similar UDF for Impala 1.2.4 in Cpp and it works correctly.

All these issues are resolved in Impala 2.1 but I cannot upgrade my cluster to it.

So how do I work around this bug?

Reference: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/v1/v1-2-4/Installing-and-Using-Impala/ciiu_udf.html


回答1:


This is IMPALA-1134 which was fixed in Impala 2.1. The issue is that the returned value is copied in the wrong way such that some extra memory may be returned at the end of your string. Previously we were using getBytes() which says only the data up to getLength() is valid. I think it could be possible to try to encode the real length in the output and then in your reversal function, take the real length and only use the valid portion. However, this seems pretty tricky. I'd highly recommend finding a way to upgrade to the latest version of Impala as there are many bug fixes since 1.4.



来源:https://stackoverflow.com/questions/30125455/using-hive-udf-in-impala-gives-erroneous-results-in-impala-1-2-4

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!