utf-8 to iso-8859-1 encoding problem

时光毁灭记忆、已成空白 提交于 2019-12-13 19:15:56

问题


I'm trying preview the latest post from an rss feed on another website. The feed is UTF-8 encoded, whilst the website is ISO-8859-1 encoded. When displaying the title, I'm using;

 $post_title = 'Blogging – does it pay the bills?';

 echo mb_convert_encoding($post_title, 'iso-8859-1','utf-8');

 // returns: Blogging ? does it pay the bills?
 // expected: Blogging - does it pay the bills?

Note that the hyphen I'm expecting isn't a normal minus sign but some big-ass uber dash. Well, a few pixels longer anyway. :) Not sure how else to describe it as my keyboard can't produce that character...


回答1:


I suspect you mean an Em Dash (—). ISO-8859-1 doesn't include this character, so you aren't going to have much luck converting it to that encoding.

You could use htmlentities(), but I'd suggest moving off ISO-8859-1 to UTF-8 for publication.




回答2:


mb_convert_encoding only converts the internal encoding - it won't actually change the byte sequences for characters from one character set to another. For that you need iconv.

mb_internal_encoding( 'UTF-8' );
ini_set( 'default_charset', 'ISO-8859-1' );

$post_title = 'Blogging — does it pay the bills?'; // I used the actual m-dash here to best mimic your scenario

echo iconv( 'UTF-8', 'ISO-8859-1//TRANSLIT', $post_title );

Or, as others have said, just convert out-of-range characters to html entities.




回答3:


I suppose the following:

  • Your file is actually encoded with UTF-8
  • Your editor interprets the file with Windows-1252

The reason for that is that your EM DASH character (U+2014) is represented by –. That’s exactly what you get when you interpret the UTF-8 code word of that character (0xE28094) with Windows-1252 (0xE2=â, 0x80=, 0x94=). So you first need to fix your editor encoding.

And the reason for the ? in your output is that ISO 8859-1 doesn’t contain the EM DASH character.




回答4:


It's probably an em dash (U+2014), and what you're trying to do isn't converting the encoding, because the hyphen is a different character. In other words, you want to search for such characters and replace them manually.

Better yet, just switch the website to UTF-8. It largely coincides with Latin-1 and is more appropriate for a website in 2009.



来源:https://stackoverflow.com/questions/1567100/utf-8-to-iso-8859-1-encoding-problem

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!