How do I convert Word smart quotes and em dashes in a string?

前端 未结 13 1701
星月不相逢
星月不相逢 2020-11-29 03:11

I have a form with a textarea. Users enter a block of text which is stored in a database.

Occasionally a user will paste text from Word containing smart quotes or em

相关标签:
13条回答
  • 2020-11-29 03:22

    This is an unfortunately all-too-common problem, not helped by PHP's very poor handling of character sets.

    What we do is force the text through iconv

    // Convert input data to UTF8, ignore any odd (MS Word..) chars
    // that don't translate
    $input = iconv("ISO-8859-1","UTF-8//IGNORE",$input);
    

    The //IGNORE flag means that anything that can't be translated will be thrown away.

    If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded.

    0 讨论(0)
  • 2020-11-29 03:26

    This sounds like a Unicode issue. Joel Spolsky has a good jumping off point on the topic: http://www.joelonsoftware.com/articles/Unicode.html

    0 讨论(0)
  • 2020-11-29 03:27

    the problem is on the mysql charset, I fixed my issues with this line of code.

    mysql_set_charset('utf8',$link); 
    
    0 讨论(0)
  • 2020-11-29 03:27

    This may not be the best solution, but I'd try testing to find out what PHP sees. Let's say it sees "–" (there are a few other possibilities, like simple "“" or maybe "“"). Then do a str_replace to get rid of all of those and replace them with normal quotes, before stuffing the answer in a database.

    The better solution would probably involve making the end-to-end data passing all UTF-8, as people are trying to help with in other answers.

    0 讨论(0)
  • 2020-11-29 03:28

    We would often use standard string replace functions for that. Even though the nature of ASCII/Unicode in that context is pretty murky, it works. Just make sure your php file is saved in the right encoding format, etc.

    0 讨论(0)
  • 2020-11-29 03:33

    You have to be sure your database connection is configured to accept and provide UTF-8 from and to the client (otherwise it will convert to the "default", which is usually latin1).

    In practice this means running a query SET NAMES 'utf8';

    http://www.phpwact.org/php/i18n/utf-8/mysql

    Also, smart quotes are part of the windows-1252 character set, not iso-8859-1 (latin-1). Not very relevant to your problem, but just FYI. The euro symbol is in there as well.

    0 讨论(0)
提交回复
热议问题