SparkSQL on HBase Tables

后端 未结 1 823
别那么骄傲
别那么骄傲 2021-01-01 00:34

Anybody is using SparkSQL on HBase tables directly, like SparkSQL on Hive tables. I am new to spark.Please guide me how to connect hbase and spark.How to query on hbase tabl

相关标签:
1条回答
  • 2021-01-01 01:02

    AFAIK there are 2 ways to connect to hbase tables

    - Directly connect to Hbase :

    Directly connect hbase and create a DataFrame from RDD and execute SQL on top of that. Im not going to re-invent the wheel please see How to read from hbase using spark as the answer from @iMKanchwala in the above link has already described it. only thing is convert that in to dataframe (using toDF) and follow the sql approach.

    - Register table as hive external table with hbase storage handler and you can use hive on spark from hivecontext. It is also easy way.

    Ex : 
    CREATE TABLE users(
    userid int, name string, email string, notes string)
    STORED BY 
    'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
    WITH SERDEPROPERTIES ( 
    "hbase.columns.mapping" = 
    ”small:name,small:email,large:notes”);
    

    How to do that please see as an example

    I would prefer approach 1.

    Hope that helps...

    0 讨论(0)
提交回复
热议问题