A little spoon feeding required, how to import complex json into hive. Json file in the format of:{\"some-headers\":\"\", \"dump\":[{\"item-id\":\"item-1\"},{\"item-id\":\
posting End-to-End solution. Step by step procedure to convert JSON to hive table:
step 1) install maven if not there already
>$ sudo apt-get install maven
step 2) install git if not there already
>sudo git clone https://github.com/rcongiu/Hive-JSON-Serde.git
step 3) go into the $HOME/HIVE-JSON_Serde folder
step 4) build the serde package
>sudo mvn -Pcdh5 clean package
step 5) The serde file will be in $HOME/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7-SNAPSHOT-jar-with-dependencies.jar
step 6) Add serde as dependency jar in hive
hive> ADD JAR $HOME/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7- SNAPSHOT-jar-with-dependencies.jar;
step 7) create json file in $HOME/books.json (Example)
{"value": [{"id": "1","bookname": "A","properties": {"subscription": "1year","unit": "3"}},{"id": "2","bookname":"B","properties":{"subscription": "2years","unit": "5"}}]}
step 8) create tmp1 table in hive
hive>CREATE TABLE tmp1 (
value ARRAY<struct<id:string,bookname:string,properties:struct<subscription:string,unit:string>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'mapping.value' = 'value'
)
STORED AS TEXTFILE;
step 9) load the data from json to tmp1 table
>LOAD DATA LOCAL INPATH '$HOME/books.json' INTO TABLE tmp1;
step 10) create a tmp2 table to do explode operation form tmp1, this intermediate step is to break multi level json structure into multiple rows Note: if your JSON structure is simple and single level , avoid this step
hive>create table tmp2 as
SELECT *
FROM tmp1
LATERAL VIEW explode(value) itemTable AS items;
step 11) create hive table and load the values from tmp2 table
hive>create table books as
select value[0].id as id, value[0].bookname as name, value[0].properties.subscription as subscription, value[0].properties.unit as unit from tmp2;
step 12) drop tmp tables
hive>drop table tmp1;
hive>drop table tmp2;
step 13) test the hive table
hive>select * from books;
output:
id name subscription unit
1 B 1year 3
2 B 2years 5
You can import JSON into Hive by implementing the HiveSerDe.
This link serves as a sample implementation.
https://github.com/rcongiu/Hive-JSON-Serde
You can also refer to these links
How do you make a HIVE table out of JSON data?