Athena DDL for Ion format?

五迷三道 提交于 2021-01-05 07:22:46

问题


I'm trying to use Athena to query some files that are in Ion format produced by the recently added Export To S3 feature of DynamoDB backups.

This is a blatantly stupid format which is basically the string $ion_1_0 followed by json. The unquoted $ion_1_0 string at the front makes the data invalid json.

I tried using the Ion Serde from here:

CREATE EXTERNAL TABLE mydb.mytable (
`myfields` string,
 ...
)
ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe'
LOCATION 's3:/.../dynamodb-export/AWSDynamoDB/01608775578817-a6944d97/data/'
TBLPROPERTIES ('has_encrypted_data'='true');

But got this:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: com.amazon.ionhiveserde.IonHiveSerDe

UPDATE

Actually the format is even a little worse than I thought. The field names are not quoted. So it's not quite valid json even after stripping the $ion prefix.


回答1:


ION is an open-source textual format which is a superset of JSON. Have you tried converting your ION file(s) with glue? ION is one of the format options supported (for input): https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format.html

This QLDB workshop uses ION in its example, you could explore the cloudformation template/yaml or deploy the workflow and dig into the crawler and job it creates for some ideas: https://qldb-immersionday.workshop.aws/en/lab3/task3.html

Check out the ION cookbook for some additional information: https://amzn.github.io/ion-docs/guides/cookbook.html

And the specs: https://amzn.github.io/ion-docs/docs/spec.html



来源:https://stackoverflow.com/questions/65433335/athena-ddl-for-ion-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!