We have a file in S3 that is loaded in to Redshift via the COPY
command. The import is failing because a VARCHAR(20)
value contains an Ä
Please check below link
http://docs.aws.amazon.com/redshift/latest/dg/multi-byte-character-load-errors.html
You should use ACCEPTINVCHARS in you copy command. Details here
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html#acceptinvchars
the byte length for your varchar
column just needs to be larger.
Multi-byte characters (UTF-8) are supported in the varchar
data type, however the length that is provided is in bytes, NOT characters.
AWS documentation for Multibyte Character Load Errors states the following:
VARCHAR
columns accept multibyte UTF-8 characters, to a maximum of four bytes.
Therefore if you want the character Ä
to be allowed, then you need to allow 2 bytes for this character, instead of 1 byte.
AWS documentation for VARCHAR or CHARACTER VARYING states the following:
... so a
VARCHAR(120)
column consists of a maximum of 120 single-byte characters, 60 two-byte characters, 40 three-byte characters, or 30 four-byte characters.
For a list of UTF-8 characters and their byte lengths, this is a good reference: Complete Character List for UTF-8
Detailed information for the Unicode Character 'LATIN CAPITAL LETTER A WITH DIAERESIS' (U+00C4) can be found here.
I have a similar experience which some only characters like Ä
were not copied correctly when loading mysqldump data into our Redshift cluster. It was because the encoding of mysqldump was latin1
which is the default character set of mysql. It's better to check the character encoding of the files for COPY first. If the encoding of your files are not UTF-8, you have to encode your files.
You need to increase the size of your varchar column. Check the stl_load_errors table, see what is the actual field value length for the failed rows and accordingly increase the size. EDIT: Just realized this is a very old post, anyways if someone need it..
Using "ACCEPTINVCHARS ESCAPE" in the copy command solved the issue for us with minor data alteration.