问题
Consider the following row in a CSV file:
1,0,True,"{""foo"":null,""bar"":null}",0,1
▲
The highlighted ,
is part of a column. That is, this full text: " {""foo"":null,""bar"":null}"
is the value of a single column. However AWS Athena is interpreting the highlighted ,
as a column-delimiting comma, incorrectly splitting that text into multiple columns.
I know I could change the column delimiter to something else to avoid this problem. My question is: Is this a bug in AWS Athena / Presto? How can I escape these commas?
回答1:
If your data is enclosed in double quotes, you need to use OpenCSVSerDe .
for the sample data, the following table definition works:
1,0,True,"{""foo"":null,""bar"":null}",0,1
How to escape comma inside the data
CREATE EXTERNAL TABLE `extra_comma`(
`a` string COMMENT 'from deserializer',
`b` string COMMENT 'from deserializer',
`c` string COMMENT 'from deserializer',
`d` string COMMENT 'from deserializer',
`e` string COMMENT 'from deserializer',
`f` string COMMENT 'from deserializer'
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://aws-glue-stackoverflow/comma_in_data/'
来源:https://stackoverflow.com/questions/53527586/presto-athena-loading-of-a-csv-file-with-quote-escaped-commas