Create Athena table from nested json source

前端 未结 2 1588
予麋鹿
予麋鹿 2021-01-07 06:35

How shall I create a Athena table from the nested json file ? This is my sample json file. I only need selected key value pairs like roofcondition and garagestalls.

相关标签:
2条回答
  • 2021-01-07 07:19

    So it seems that this also works (not a valid json )!

    Every raw of the table is one line in the json file.

    No spaces no comma at the end of line (just new line between table raws).

     {"is_active":"True","title":"mr","first_name":"admindoc","last_name":"admindoc","birthdate":"2003-09-01","home_phone":"+654654","mobile_phone":"+654654","gender":"m","language":"fr","email":"xxx+admine@sinnovation.com"}
     {"is_active":"True","title":"mr","first_name":"dok","last_name":"dok","birthdate":"1998-02-03","home_phone":"None","mobile_phone":"+654654","gender":"m","language":"fr","email":"xxx+docteur@sinnovation.com"}
    
    0 讨论(0)
  • 2021-01-07 07:22

    First of all you sent wrong version of the JSON document, correct version should look like this:

    {"reportId":"7bc7fa76-bf53-4c21-85d6-118f6a8f4244", "reportOrderedTS":"1529996028730", "createdTS":"1530304910154", "report":{"summaryElements": [{"value": "GOOD", "key": "roofCondition"},{"value": "98", "key": "storiesConfidence"},{"value": "0", "key": "garageStalls"}], "elements": [{"source": "xyz", "imageId": "0xxx_png", "modelVersion": "1.21.0", "key": "pool"},{"source": "xyz", "imageId": "0111_png", "value": "GOOD", "modelVersion": "1.36.0", "key": "roofCondition", "confidence": "49"}] }, "status":"Success", "reportReceivedTS":"1529996033830" }
    

    Yes, you can query the table on Athena with nested json. You can achieved this, for example by creating the following table:

    CREATE EXTERNAL TABLE example(
    `reportId` string,
    `reportOrderedTS` bigint,
    `createdTS` bigint,
    `report` struct<
    `summaryElements`: array<struct<`value`:string, `key`: string>>,
    `elements`: array<struct<`source`: string, `imageId`:string, `modelVersion`:string, `key`:string, `value`:string,  `confidence`:int>>>, 
    `status` string, 
    `reportReceivedTS` bigint
    )
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    LOCATION 's3://example'  
    

    This is example query:

    select reportid,reportorderedts,createdts,
    summaryelements.value, summaryelements.key, elements.source, elements.key
    from example, UNNEST(report.summaryelements) t(summaryelements), UNNEST(report.elements) t(elements)
    

    Useful links:

    https://docs.aws.amazon.com/athena/latest/ug/flattening-arrays.html

    https://docs.aws.amazon.com/athena/latest/ug/rows-and-structs.html

    0 讨论(0)
提交回复
热议问题