Special characters in AWS Athena show up as question marks

我的梦境 提交于 2021-01-29 11:08:42

问题


I've added a table in AWS Athena from a csv file, which uses special characters "æøå". These show up as � in the output. The csv file is encoded using unicode. I've also tried changing the encoding to UTF-8, with no luck. I've uploaded the csv in S3 and then added the table to Athena using the following DDL:

CREATE EXTERNAL TABLE `regions_dk`(
  `postnummer` string COMMENT 'from deserializer', 
  `kommuner` string COMMENT 'from deserializer', 
  `regioner` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
  'separatorChar'='\;') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://bucket/path'
TBLPROPERTIES (
  'classification'='csv')

I have another table which also includes the characters "æøå", which I added using an ETL script, and here there's no issue.

What am I overlooking?

来源:https://stackoverflow.com/questions/51783227/special-characters-in-aws-athena-show-up-as-question-marks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!