问题
I searched a lot on Internet but couldn't find the answer. Here is my question:
I'm writing some queries in Hive. I have a UTC timestamp and would like to change it to UTC time, e.g., given timestamp 1349049600, I would like to convert it to UTC time which is 2012-10-01 00:00:00. However if I use the built in function from_unixtime(1349049600)
in Hive, I get the local PDT time 2012-09-30 17:00:00.
I realized there is a built in function called from_utc_timestamp(timestamp, string timezone)
. Then I tried it like from_utc_timestamp(1349049600, "GMT")
, the output is 1970-01-16 06:44:09.6 which is totally incorrect.
I don't want to change the time zone of Hive permanently because there are other users. So is there any way I can get a UTC timestamp string from 1349049600 to "2012-10-01 00:00:00"? Thanks a lot!!
回答1:
As far as I can tell, from_utc_timestamp()
needs a date string argument, like "2014-01-15 11:21:15"
, not a unix seconds-since-epoch value. That might be why it is giving odd results when you pass an integer?
The only Hive function that deals with epoch seconds seems to be from_unixtime()
which gives you a timestamp string in the server timezone, which I found in /etc/sysconfig/clock
- "America/Montreal"
in my case.
So you can get a UTC timestamp string via to_utc_timestamp(from_unixtime(1389802875),'America/Montreal')
, and then convert to your target timezone with from_utc_timestamp()
It all seems very torturous, particularly having to wire your server TZ into your SQL. Life would be easier if there was a from_unixtime_utc()
function or something.
Update: from_utc_timestamp()
does deal with a milliseconds argument as well as a string, but then gets the conversion wrong.
When I try from_utc_timestamp(1389802875000, 'America/Los_Angeles')
it gives "2014-01-15 03:21:15"
which is wrong.
The correct answer is "2014-01-15 08:21:15"
which you can get (for a server in Montreal) via from_utc_timestamp(to_utc_timestamp(from_unixtime(1389802875),'America/Montreal'), 'America/Los_Angeles')
回答2:
Hey just wanted to add a little here, I'd suggest trying to "automate" the system timezone. So instead of statically
#STATIC TZ deceleration
to_utc_timestamp(from_unixtime(1389802875),'America/Montreal')
Give this a shot
#DYNAMIC TZ
select to_utc_timestamp(from_unixtime(1389802875), from_unixtime(unix_timestamp(), "z"));
This just uses the string output format of "from_unixtime
" to return the timezone string (lowercase z)
回答3:
Use it like this :
to_utc_timestamp(from_unixtime(timestamp),"PDT")
回答4:
This example provides a solution to the problem of having a hardwired value of the system time zone TZ in your hive code. It was run using hive 0.10.0 in a Centos environment, with OpenJDK java version 1.6. Because it involves time manipulation those precise software revisions might matter. Currently the system is operating in EDT. The table tblFiniteZahl is like a DUAL but with about a million rows, of, you guessed it, finite numbers. But you can substitute any table with at least 1 row. The trick is to format the time in a local timezone but use the z format to capture the timezone and then to extract that value at runtime for passing to the to_utc_timestamp function.
select D1,
D1E,
D1L,
D1LT,
D1LZ,
to_utc_timestamp(D1LT, D1LZ) as D1UTC
from (
select D1,
D1E,
D1L,
regexp_extract(D1L, '^([^ ]+[ ][^ ]+)[ ](.+)$', 1) as D1LT,
regexp_extract(D1L, '^([^ ]+[ ][^ ]+)[ ](.+)$', 2) as D1LZ
from (
select D1,
D1E,
from_unixtime(D1E, 'yyyy-MM-dd HH:mm:ss z') as D1L
from (
select D1,
unix_timestamp(D1,'yyyy-MM-dd HH:mm:ss Z') as D1E
from (
select '2015-08-24 01:15:23 UTC' as D1
from tblFiniteZahl
limit 1
) T1
) T2
) T3
) T4
;
The result is
D1 = 2015-08-24 01:15:23 UTC
DT3 = 1440378923
D1L = 2015-08-23 21:15:23 EDT
D1LT = 2015-08-23 21:15:23
D1LZ = EDT
D1UTC = 2015-08-23 21:15:23
This illustrates that the to_utc_timestamp does take a second argument of EDT.
回答5:
I went to currentmillis.com and pasted 1349049600 without realizing it was actually seconds. And indeed it returned 1970-01-16 in the date, which means that the function you suggested: from_utc_timestamp actually takes milliseconds as the first parameter? Maybe you can try again with from_utc_timestamp(1349049600000, "GMT")
?
来源:https://stackoverflow.com/questions/18278786/local-time-convert-to-utc-time-in-hive