Accessing HBase table data from Hive based on Time Stamp

主宰稳场 提交于 2019-12-11 12:38:31

问题


I have created a HBase by mentioning the default versions as 10

create 'tablename',{NAME => 'cf', VERSIONS => 10}

and inserted two rows(row1 and row2)

put 'tablename','row1','cf:id','row1id'
put 'tablename','row1','cf:name','row1name'
put 'tablename','row2','cf:id','row2id'
put 'tablename','row2','cf:name','row2name'
put 'tablename','row2','cf:name','row2nameupdate'
put 'tablename','row2','cf:name','row2nameupdateagain'
put 'tablename','row2','cf:name','row2nameupdateonemoretime'

Tried to select the data using scan

scan 'tablename',{RAW => true, VERSIONS => 10}

I'm able to see all the versions data.

Now created a Hive External table to point to this HBase table

CREATE EXTERNAL TABLE hive_timestampupdate(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:name")
TBLPROPERTIES ("hbase.table.name" = "tablename");

When I queried the table hive_timestampupdate, I'm able to see the data in HBase table.

select * from hive_timestampupdate;

Here I want to query the data based on timestamp. Is there a way to query the data based on timestamp of HBase table?


回答1:


Unfortunately, no. According to the Hive HBase Integration document,

there is currently no way to access the HBase timestamp attribute, and queries always access data with the latest timestamp.

There are some JIRAs talking about timestamp related functionality, but they don't really do what you are asking, and they haven't gotten a great reception :(



来源:https://stackoverflow.com/questions/29371987/accessing-hbase-table-data-from-hive-based-on-time-stamp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!