import complex Json data to hive

前端 未结 2 1918
死守一世寂寞
死守一世寂寞 2021-01-27 00:41

A little spoon feeding required, how to import complex json into hive. Json file in the format of:{\"some-headers\":\"\", \"dump\":[{\"item-id\":\"item-1\"},{\"item-id\":\

相关标签:
2条回答
  • 2021-01-27 01:04

    posting End-to-End solution. Step by step procedure to convert JSON to hive table:

    step 1) install maven if not there already

    >$ sudo apt-get install maven

    step 2) install git if not there already

    >sudo git clone https://github.com/rcongiu/Hive-JSON-Serde.git

    step 3) go into the $HOME/HIVE-JSON_Serde folder

    step 4) build the serde package

    >sudo mvn -Pcdh5 clean package

    step 5) The serde file will be in $HOME/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7-SNAPSHOT-jar-with-dependencies.jar

    step 6) Add serde as dependency jar in hive

     hive> ADD JAR $HOME/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7- SNAPSHOT-jar-with-dependencies.jar;
    

    step 7) create json file in $HOME/books.json (Example)

    {"value": [{"id": "1","bookname": "A","properties": {"subscription": "1year","unit": "3"}},{"id": "2","bookname":"B","properties":{"subscription": "2years","unit": "5"}}]}
    

    step 8) create tmp1 table in hive

     hive>CREATE TABLE tmp1 (
          value ARRAY<struct<id:string,bookname:string,properties:struct<subscription:string,unit:string>>>   
    )
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    WITH SERDEPROPERTIES ( 
        'mapping.value' = 'value'   
    ) 
    STORED AS TEXTFILE;
    

    step 9) load the data from json to tmp1 table

    >LOAD DATA LOCAL INPATH '$HOME/books.json' INTO TABLE tmp1;
    

    step 10) create a tmp2 table to do explode operation form tmp1, this intermediate step is to break multi level json structure into multiple rows Note: if your JSON structure is simple and single level , avoid this step

    hive>create table tmp2 as 
     SELECT *
     FROM tmp1
     LATERAL VIEW explode(value) itemTable AS items;
    

    step 11) create hive table and load the values from tmp2 table

    hive>create table books as 
    select value[0].id as id, value[0].bookname as name, value[0].properties.subscription as subscription, value[0].properties.unit as unit from tmp2;
    

    step 12) drop tmp tables

    hive>drop table tmp1;
    hive>drop table tmp2;
    

    step 13) test the hive table

    hive>select * from books;
    

    output:

    id name subscription unit

    1 B 1year 3

    2 B 2years 5

    0 讨论(0)
  • 2021-01-27 01:31

    You can import JSON into Hive by implementing the HiveSerDe.

    This link serves as a sample implementation.

    https://github.com/rcongiu/Hive-JSON-Serde

    You can also refer to these links

    How do you make a HIVE table out of JSON data?

    0 讨论(0)
提交回复
热议问题