I am trying to import data in the following format into a hive table
[
{
"identifier" : "id#1",
"dataA" : "dataA#1"
},
{
"identifier" : "id#2",
"dataA" : "dataA#2"
}
]
I have multiple files like this and I want each {} to form one row in the table. This is what I have tried:
CREATE EXTERNAL TABLE final_table(
identifier STRING,
dataA STRING
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://bucket/path_in_bucket/"
This is not creating a single row for each {} though. I have also tried
CREATE EXTERNAL TABLE final_table(
rows ARRAY< STRUCT<
identifier: STRING,
dataA: STRING
>>
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://bucket/path_in_bucket/"
but this is not work either. Is there some way of specifying that the input as an array with each record being an item in the array to the hive query? Any suggestions on what to do?
JSON records in data files must appear one per line, an empty line would produce a NULL record.
This json should work
{ "identifier" : "id#1", "dataA" : "dataA#1" }, { "identifier" : "id#2", "dataA" : "dataA#2" }
Here is what you need
Method 1: Adding name to the array
Data
{"data":[{"identifier" : "id#1","dataA" : "dataA#1"},{"identifier" : "id#2","dataA" : "dataA#2"}]}
SQL
SET hive.support.sql11.reserved.keywords=false;
CREATE EXTERNAL TABLE IF NOT EXISTS ramesh_test (
data array<
struct<
identifier:STRING,
dataA:STRING
>
>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 'my_location';
SELECT rows.identifier,
rows.dataA
FROM ramesh_test d
LATERAL VIEW EXPLODE(d.data) d1 AS rows ;
Output
Method 2 - No Changes to the data
Data
[{"identifier":"id#1","dataA":"dataA#1"},{"identifier":"id#2","dataA":"dataA#2"}]
SQL
CREATE EXTERNAL TABLE IF NOT EXISTS ramesh_raw_json (
json STRING
)
LOCATION 'my_location';
SELECT get_json_object (exp.json_object, '$.identifier') AS Identifier,
get_json_object (exp.json_object, '$.dataA') AS Identifier
FROM ( SELECT json_object
FROM ramesh_raw_json a
LATERAL VIEW EXPLODE (split(regexp_replace(regexp_replace(a.json,'\\}\\,\\{','\\}\\;\\{'),'\\[|\\]',''), '\\;')) json_exploded AS json_object ) exp;
Output
来源:https://stackoverflow.com/questions/47522270/how-do-i-import-an-array-of-data-into-separate-rows-in-a-hive-table