Dynamic partitioning in Hive through the exact inserted timestamp

大兔子大兔子 提交于 2021-02-04 21:05:39

问题


I need to insert data to a given external table which should be partitioned by the inserted date. My question is how is Hive handling the timestamp generation? When I select a timestamp for all inserted records like this:

WITH delta_insert AS (
        SELECT trg.*, from_unixtime(unix_timestamp()) AS generic_timestamp
        FROM target_table trg
        )
SELECT *
FROM delta_insert;

Will the timestamp always be identical for all records, even if the query takes a lot of time to un?

Or should I alternatively only select an actual timestamp and join it with everything that is selected afterwards?

WITH insert_timestamp AS (
    SELECT from_unixtime(unix_timestamp()) AS insert_timestamp
),
delta_insert AS (
        SELECT trg.*, insert_timestamp.insert_timestamp
        FROM target_table trg, insert_timestamp
        )
SELECT *
FROM delta_insert;

Since it's not recommended to do cross joins in Hive, I wonder what would be the best approach, since I don't have enough test data to simulate long running queries to be sure they get inserted within the same partition of a delta load.


回答1:


No, the timestamp generated using unix_timestamp() WILL NOT always be identical for all records. This function is non-deterministic and prevents proper optimization of queries - this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP constant (calculated only one time for the query). See this docs: Date functions This not applies to unix_timestamp(string date) with argument. unix_timestamp(string date) function is deterministic.

current_timestamp returns the current timestamp at the start of query evaluation (as of Hive 1.2.0). All calls of current_timestamp within the same query return the same value. If you need date, use current_date function.



来源:https://stackoverflow.com/questions/58078690/dynamic-partitioning-in-hive-through-the-exact-inserted-timestamp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!