AWS Datapipeline - issue with accented characters

问题

I am new to AWS datapipeline. I created a successful datapipeline to pull all the content from RDS to S3 bucket. Everything works. I see my .csv file in S3 bucket. But I am storing spanish names in my table, in csv I see "Garcï¿½a" instead of "García"

回答1:

Looks like the wrong codepage is used. Just reference the correct codepage and you should be fine. The following topic might help: Text files uploaded to S3 are encoded strangely?

回答2:

AWS DataPipeline is implemented in Java, and uses JDBC (Java Database Connectivity) drivers (specifically, MySQL Connector/J for MySQL in your case) to connect to the database. According to the Using Character Sets and Unicode section of the documentation, the character set used by the connector is automatically determined based on the character_set_server system variable on the RDS/MySQL server, which is set to latin1 by default.

If this setting is not correct for your application (run SHOW VARIABLES LIKE 'character%'; in a MySQL client to confirm), you have two options to correct this:

Set character_set_server to utf8 on your RDS/MySQL server. To make this change permanently from the RDS console, see Modifying Parameters in a DB Parameter Group for instructions.
Pass additional JDBC properties in your DataPipeline configuration to override the character set used by the JDBC connection. For this approach, add the following JDBC properties to your RdsDatabase or JdbcDatabase object (see properties reference):

"jdbcProperties": "useUnicode=true,characterEncoding=UTF-8"

回答3:

This question is a little similar to this Text files uploaded to S3 are encoded strangely?. If so, kindly reference my answer there.

来源：https://stackoverflow.com/questions/41623185/aws-datapipeline-issue-with-accented-characters

标签

mysql

amazon-web-services

amazon-rds

amazon-data-pipeline