问题
I'm storing responses from the google geocoded into my database.
One such response has an accent mark over the o. The town is Rincon. (I would put the accent mark here to show you, but I don't know how.)
When the response is stored into my mySQL database it looks like this: Rincón
I realize that this has to do with Collation, but I'm nervous about making changes to the database because I have so much data in there. The Collation that is currently applied to this field is utf8_general_ci.
Can anyone advise: 1) Is utf8_general_ci the correct Collation. 2) Do I need to somehow specify this Collation in my ajax request?
Thank you....
回答1:
UTF-8 is (generally) a “safe” encoding for any character set in the world. (Not always the most efficient, and there are some arguments to be made that Unicode under-represents the CJK scripts with its “unified han” model, but moving on…)
However, it's likely that your interface program(s) are not translating to/from UTF-8 properly. For example, ó => ó looks like the UTF-8 data (where one character can be spread across a varying number of bytes) is being presented to you using a single-byte European encoding, like ISO-8859-15 or MS-CP-1451 or similar.
You are probably storing the data correctly, but loading it incorrectly. If you're just using the mysql
terminal program or similar, make sure that your terminal is set to use UTF-8 (on a Unix/Linux system, locale
should probably be something ending in .utf8
, e.g. mine has LANG=en_US.utf8
)
If you're pulling data using a GUI tool or similar, check its Settings/Preferences panel for the character set.
If you're getting the mistranslated characters back into an application you've written, look at your language's tools for setting the locale. (Perhaps, the INSERT
routines have it right, but the SELECT
routines have it wrong?)
And, if this is being sent to the Web, make sure your (XML|HTML|XHTML) files have charset=utf8
declared in the appropriate place(s), or translate back from UTF-8 to the character set of your document (if possible) using something like iconv
when inserting text from the database. (Most non-Unicode character sets can only represent a subset of Unicode, of course; e.g. the ISO-8859-15 set does a decent job at covering European languages, but has no support for Cyrillic, Arabic, or CJK writing systems, so it's possible to fail to translate a character.) In Perl, you can use pass arguments to open
or use binmode
to set up a transparent character set translation layer on a "filehandle" stream.
来源:https://stackoverflow.com/questions/8447784/how-to-store-accent-marks-over-characters-in-my-database