问题
I've added a table in AWS Athena from a csv file, which uses special characters "æøå". These show up as � in the output. The csv file is encoded using unicode. I've also tried changing the encoding to UTF-8, with no luck. I've uploaded the csv in S3 and then added the table to Athena using the following DDL:
CREATE EXTERNAL TABLE `regions_dk`(
`postnummer` string COMMENT 'from deserializer',
`kommuner` string COMMENT 'from deserializer',
`regioner` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar'='\;')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://bucket/path'
TBLPROPERTIES (
'classification'='csv')
I have another table which also includes the characters "æøå", which I added using an ETL script, and here there's no issue.
What am I overlooking?
来源:https://stackoverflow.com/questions/51783227/special-characters-in-aws-athena-show-up-as-question-marks