问题
I ran into a new problem that I've never seen before: My client is adding files to a project we built and some of the filenames have special characters in them because some of the words are spanish.
For example a file I'm testing has an á in it. I am calling that image in a css file as a background image but in Safari it doesnt show up. But it does on FF and Chrome.
As a test I pasted the link into the browser and the same thing. Works on FF and Chrome but Safari throws an error. So the language characters are throwing it I guess?
Firefox converts the following url and changes the á to a%CC%81 and loads the image.
http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Clássico_foto-Henrique-Peron-470x120-1371827671.jpg
You can see it breaks above... but FF and Chrome convert that to: http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Cla%CC%81ssico_foto-Henrique-Peron-470x120-1371827671.jpg
You can also see this in action here: http://jsfiddle.net/Md4gZ/2/
.testbox {
width:340px;
height:100px;
background:url('http://www.themediacouncil.com/test/nonascii/LA-MAR_Cebiche-Clássico_foto-Henrique-Peron-470x120-1371827671.jpg') no-repeat top left;
}
So whats the right way to handle this. I'm developing in PHP and WORDPRESS. I'd rather not have to tell the client to go back and replace all files with special characters.
Any help is appreciated. Thanks!
回答1:
I believe what is becoming the standard is to convert non-ascii characters to UTF-8 byte sequences, and include those sequences as %HH hex codes in the URL. The á character is U+00E1 (Unicode), which in UTF-8 makes the two bytes 0xC3 0xA1
. Hence, Clássico
would become Cl%C3%A1ssico
.
The conversion you report from Firefox, Cla%CC%81ssico
, did this slightly differently: it changed the á into a followed by U+0301, the COMBINING ACUTE ACCENT character. In UTF-8, U+0301 makes 0xCC 0x81
.
Which representation you should choose – unicode “á” or “a followed by combining accent” – depends on what the web server needs for matching the right thing. In your case, maybe the filename actually contains the combining-character accent, and that's why it worked (hard to tell).
Another, older, way to handle non-ascii latin characters is to use an 8-bit latin charset representation (ISO-8859-1 or something similar, such as Windows-1252) and encode that as one byte. That would make Clássico
into Cl%E1ssico
. But since this only works for latin charsets, and is ambiguous for some of their characters, it is hopefully and probably disappearing.
来源:https://stackoverflow.com/questions/17242846/non-ascii-characters-in-url