AWS Datapipeline - issue with accented characters

青春壹個敷衍的年華 提交于 2019-12-24 11:37:52

问题


I am new to AWS datapipeline. I created a successful datapipeline to pull all the content from RDS to S3 bucket. Everything works. I see my .csv file in S3 bucket. But I am storing spanish names in my table, in csv I see "Garc�a" instead of "García"


回答1:


Looks like the wrong codepage is used. Just reference the correct codepage and you should be fine. The following topic might help: Text files uploaded to S3 are encoded strangely?




回答2:


AWS DataPipeline is implemented in Java, and uses JDBC (Java Database Connectivity) drivers (specifically, MySQL Connector/J for MySQL in your case) to connect to the database. According to the Using Character Sets and Unicode section of the documentation, the character set used by the connector is automatically determined based on the character_set_server system variable on the RDS/MySQL server, which is set to latin1 by default.

If this setting is not correct for your application (run SHOW VARIABLES LIKE 'character%'; in a MySQL client to confirm), you have two options to correct this:

  1. Set character_set_server to utf8 on your RDS/MySQL server. To make this change permanently from the RDS console, see Modifying Parameters in a DB Parameter Group for instructions.
  2. Pass additional JDBC properties in your DataPipeline configuration to override the character set used by the JDBC connection. For this approach, add the following JDBC properties to your RdsDatabase or JdbcDatabase object (see properties reference):

    "jdbcProperties": "useUnicode=true,characterEncoding=UTF-8"




回答3:


This question is a little similar to this Text files uploaded to S3 are encoded strangely?. If so, kindly reference my answer there.



来源:https://stackoverflow.com/questions/41623185/aws-datapipeline-issue-with-accented-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!