问题
I'm trying to use Athena to query some files that are in Ion format produced by the recently added Export To S3 feature of DynamoDB backups.
This is a blatantly stupid format which is basically the string $ion_1_0
followed by json. The unquoted $ion_1_0
string at the front makes the data invalid json.
I tried using the Ion Serde from here:
CREATE EXTERNAL TABLE mydb.mytable (
`myfields` string,
...
)
ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe'
LOCATION 's3:/.../dynamodb-export/AWSDynamoDB/01608775578817-a6944d97/data/'
TBLPROPERTIES ('has_encrypted_data'='true');
But got this:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: com.amazon.ionhiveserde.IonHiveSerDe
UPDATE
Actually the format is even a little worse than I thought. The field names are not quoted. So it's not quite valid json even after stripping the $ion prefix.
回答1:
ION is an open-source textual format which is a superset of JSON. Have you tried converting your ION file(s) with glue? ION is one of the format options supported (for input): https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format.html
This QLDB workshop uses ION in its example, you could explore the cloudformation template/yaml or deploy the workflow and dig into the crawler and job it creates for some ideas: https://qldb-immersionday.workshop.aws/en/lab3/task3.html
Check out the ION cookbook for some additional information: https://amzn.github.io/ion-docs/guides/cookbook.html
And the specs: https://amzn.github.io/ion-docs/docs/spec.html
来源:https://stackoverflow.com/questions/65433335/athena-ddl-for-ion-format