问题
I have nested JSON of type
[{
"emails": [{
"label": "",
"primary": "",
"relationdef_id": "",
"type": "",
"value": ""
}],
"licenses": [{
"allocated": "",
"parent_type": "",
"parentid": "",
"product_type": "",
"purchased_license_id": "",
"service_type": ""
}, {
"allocated": "",
"parent_type": "",
"parentid": "",
"product_type": "",
"purchased_license_id": "",
"service_type": ""
}]
}, {
"emails": [{
"label": "",
"primary": "",
"relationdef_id": "",
"type": "",
"value": ""
}],
"licenses": [{
"allocated": "2016-04-26 01:46:26",
"parent_type": "",
"parentid": "",
"product_type": "",
"purchased_license_id": "",
"service_type": ""
}]
}]
which is not able to be converted to athena table.
I have tried to update it to list of objects also
{
"emails": [{
"label": "",
"primary": "",
"relationdef_id": "",
"type": "",
"value": ""
}
],
"licenses": [{
"allocated": "",
"parent_type": "",
"parentid": "",
"product_type": "",
"purchased_license_id": "",
"service_type": ""
},{
"allocated": "",
"parent_type": "",
"parentid": "",
"product_type": "",
"purchased_license_id": "",
"service_type": ""
}
]
}
{
"emails": [{
"label": "",
"primary": "",
"relationdef_id": "",
"type": "",
"value": ""
}
],
"licenses": [{
"allocated": "",
"parent_type": "",
"parentid": "",
"product_type": "",
"purchased_license_id": "",
"service_type": ""
}
]
}
{
"emails": [{
"label": "",
"primary": "",
"relationdef_id": "",
"type": "",
"value": ""
}
],
"licenses": [{
"allocated": "",
"parent_type": "",
"parentid": "",
"product_type": "",
"purchased_license_id": "",
"service_type": ""
}
]
}
with Query:
CREATE EXTERNAL TABLE `test_orders1`(
`emails` array<struct<`label`: string, `primary`: string,`relationdef_id`: string,`type`: string, `value`: string>>,
`licenses` array<struct<`allocated`: string, `parent_type`: string, `parentid`: string, `product_type`: string,`purchased_license_id`: string, `service_type`: string>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
but only 1 row is formed. Is there a way where i can use Nested json of type JSONArray into Athena table? Or how can I change Nested Json that will work for me?
回答1:
When querying JSON data Athena requires the files to be formatted with one JSON document per line. It's unclear from your question if this is the case or not, the examples you give are multiline, but perhaps that's only to make the question more clear.
The table DDL you include looks like it should work on the second example data, provided that it is formatted as one document per line, e.g.
{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}, { "allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}
{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}
{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}
来源:https://stackoverflow.com/questions/63841006/create-table-in-athena-from-nested-json