hiveql

Calculate difference between start_time and end_time in seconds from unix_time yyyy-MM-dd HH:mm:ss

妖精的绣舞 提交于 2021-02-16 14:52:57
问题 I'm still learning SQL and I found a couple of solutions on SQL Server or Postgreы, but it doesn't seen to work on HUE DATEDIFF , only allows me to calculate difference between days seconds, minutes are not available. Help is very welcome. I was able to split the timestamp with substring_index , but then I can't find the right approach to compare and subtract start_time to end_time in order to obtain the accurate account of seconds. I can't find time functions so I'm assuming I should

How to convert “2019-11-02T20:18:00Z” to timestamp in HQL?

拟墨画扇 提交于 2021-02-15 07:28:06
问题 I have datetime string "2019-11-02T20:18:00Z" . How can I convert it into timestamp in Hive HQL? 回答1: If you want preserve milliseconds then remove Z , replace T with space and convert to timestamp: select timestamp(regexp_replace("2019-11-02T20:18:00Z", '^(.+?)T(.+?)Z$','$1 $2')); Result: 2019-11-02 20:18:00 Also it works with milliseconds: select timestamp(regexp_replace("2019-11-02T20:18:00.123Z", '^(.+?)T(.+?)Z$','$1 $2')); Result: 2019-11-02 20:18:00.123 Using from_unixtime(unix

What happens when a hive insert is failed halfway?

岁酱吖の 提交于 2021-02-15 03:13:21
问题 Suppose an insert is expected to load 100 records in hive and 40 records have been inserted and the insert failed for some reason. will the transaction roll back completely, undoing 40 records which were inserted? or Will we see 40 records in the hive table even after the insert query failed? 回答1: The operation is atomic (even for non-ACID table): If you inserting or rewriting data using HiveQL, it writes data into temporary location and only if the command succeeds files are moved to the

What happens when a hive insert is failed halfway?

给你一囗甜甜゛ 提交于 2021-02-15 03:13:20
问题 Suppose an insert is expected to load 100 records in hive and 40 records have been inserted and the insert failed for some reason. will the transaction roll back completely, undoing 40 records which were inserted? or Will we see 40 records in the hive table even after the insert query failed? 回答1: The operation is atomic (even for non-ACID table): If you inserting or rewriting data using HiveQL, it writes data into temporary location and only if the command succeeds files are moved to the

epoch with milliseconds to timestamp with milliseconds conversion in Hive

女生的网名这么多〃 提交于 2021-02-10 20:14:05
问题 How can I convert unix epoch with milliseconds to timestamp with milliseconds In Hive? Neither cast() nor from_unixtime() function is working to get the timestamp with milliseconds. I tried .SSS but the function just increases the year and doesn't take it as a part of millisecond. scala> spark.sql("select from_unixtime(1598632101000, 'yyyy-MM-dd hh:mm:ss.SSS')").show(false) +-----------------------------------------------------+ |from_unixtime(1598632101000, yyyy-MM-dd hh:mm:ss.SSS)| +-------

Why Hive can not support non-equi join?

感情迁移 提交于 2021-02-10 18:14:37
问题 I found that the Hive does not support non-equi join.Is it just because it is difficult to convert non-equi join to Map reduce? 回答1: Yes, the problem is in current map-reduce implementation. How common equi-join is implemented in MapReduce? Input records are being copied in chunks to the mappers, mappers produce output as key-value pairs, which are collected and distributed between reducers using some function in such way that each reducer will process the whole key, in other words, mapper

Why Hive can not support non-equi join?

醉酒当歌 提交于 2021-02-10 17:55:57
问题 I found that the Hive does not support non-equi join.Is it just because it is difficult to convert non-equi join to Map reduce? 回答1: Yes, the problem is in current map-reduce implementation. How common equi-join is implemented in MapReduce? Input records are being copied in chunks to the mappers, mappers produce output as key-value pairs, which are collected and distributed between reducers using some function in such way that each reducer will process the whole key, in other words, mapper

How to provide arguments to IN Clause in HIve

蓝咒 提交于 2021-02-10 17:33:48
问题 Is there any way to read arguments in HIVEquery which can substitute to an IN Clause. I have the below query with me. Select count (*) from table where id in ('1','2','3','4','5'). Is there any way to supply the arguments to IN Clause from a text file ? 回答1: Use in_file: Put all ids into file, one id in a row. Select count (*) from table where in_file(id, '/tmp/myfilename'); --local file Also you can pass the list of values as a single parameter to the IN: https://stackoverflow.com/a/56963448

Is it possible to compress json in hive external table?

冷暖自知 提交于 2021-02-10 13:33:16
问题 I want to know how to compress json data in hive external table. How can it be done? I have created external table like this: CREATE EXTERNAL TABLE tweets ( id BIGINT,created_at STRING,source STRING,favorited BOOLEAN )ROW FORMAT SERDE "com.cloudera.hive.serde.JSONSerDe" LOCATION "/user/cloudera/tweets"; and I had set the compression properties set mapred.output.compress=true; set hive.exec.compress.output=true; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; set

Hive - How to cast array to string?

匆匆过客 提交于 2021-02-08 10:13:00
问题 I'm trying to coerce a column containing a comma separated array to a string in Hive. SELECT email_address, CAST(explode(GP_array AS STRING)) AS GP FROM dm.TP i get the following error Line: 1 - FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions 回答1: explode function Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array. you would need concat_ws function to