How do I import an array of data into separate rows in a hive table?

纵饮孤独 提交于 2019-12-04 17:31:53

JSON records in data files must appear one per line, an empty line would produce a NULL record.

This json should work

{ "identifier" : "id#1", "dataA" : "dataA#1" }, { "identifier" : "id#2", "dataA" : "dataA#2" }

Here is what you need

Method 1: Adding name to the array

Data

{"data":[{"identifier" : "id#1","dataA" : "dataA#1"},{"identifier" : "id#2","dataA" : "dataA#2"}]}

SQL

SET hive.support.sql11.reserved.keywords=false;

CREATE EXTERNAL TABLE IF NOT EXISTS ramesh_test (
  data array<
    struct<
      identifier:STRING, 
      dataA:STRING
    >
  >
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 'my_location';

SELECT rows.identifier,
       rows.dataA
  FROM ramesh_test d
LATERAL VIEW EXPLODE(d.data) d1 AS rows  ;

Output

Method 2 - No Changes to the data

Data

[{"identifier":"id#1","dataA":"dataA#1"},{"identifier":"id#2","dataA":"dataA#2"}]

SQL

CREATE EXTERNAL TABLE IF NOT EXISTS ramesh_raw_json (
  json STRING
)
LOCATION 'my_location';

SELECT get_json_object (exp.json_object, '$.identifier') AS Identifier,
       get_json_object (exp.json_object, '$.dataA') AS Identifier
  FROM ( SELECT json_object
           FROM ramesh_raw_json a
           LATERAL VIEW EXPLODE (split(regexp_replace(regexp_replace(a.json,'\\}\\,\\{','\\}\\;\\{'),'\\[|\\]',''), '\\;')) json_exploded AS json_object ) exp;

Output

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!