How do I convert Word smart quotes and em dashes in a string?

前端未结

关注

 13  1701

星月不相逢

I have a form with a textarea. Users enter a block of text which is stored in a database.

Occasionally a user will paste text from Word containing smart quotes or em

相关标签:

13条回答

梦谈多话

2020-11-29 03:22
This is an unfortunately all-too-common problem, not helped by PHP's very poor handling of character sets.

What we do is force the text through iconv
```
// Convert input data to UTF8, ignore any odd (MS Word..) chars
// that don't translate
$input = iconv("ISO-8859-1","UTF-8//IGNORE",$input);
```
The //IGNORE flag means that anything that can't be translated will be thrown away.

If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded.
0 讨论(0)
发布评论:

提交评论
- 加载中...
南旧

2020-11-29 03:26

This sounds like a Unicode issue. Joel Spolsky has a good jumping off point on the topic: http://www.joelonsoftware.com/articles/Unicode.html

0 讨论(0)
发布评论:

提交评论
- 加载中...
执笔经年

2020-11-29 03:27
the problem is on the mysql charset, I fixed my issues with this line of code.
```
mysql_set_charset('utf8',$link); 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
臣服心动

2020-11-29 03:27

This may not be the best solution, but I'd try testing to find out what PHP sees. Let's say it sees "â€“" (there are a few other possibilities, like simple "“" or maybe "“"). Then do a str_replace to get rid of all of those and replace them with normal quotes, before stuffing the answer in a database.

The better solution would probably involve making the end-to-end data passing all UTF-8, as people are trying to help with in other answers.

0 讨论(0)
发布评论:

提交评论
- 加载中...
我寻月下人不归

2020-11-29 03:28

We would often use standard string replace functions for that. Even though the nature of ASCII/Unicode in that context is pretty murky, it works. Just make sure your php file is saved in the right encoding format, etc.

0 讨论(0)
发布评论:

提交评论
- 加载中...
闹比i

2020-11-29 03:33

You have to be sure your database connection is configured to accept and provide UTF-8 from and to the client (otherwise it will convert to the "default", which is usually latin1).

In practice this means running a query SET NAMES 'utf8';

http://www.phpwact.org/php/i18n/utf-8/mysql

Also, smart quotes are part of the windows-1252 character set, not iso-8859-1 (latin-1). Not very relevant to your problem, but just FYI. The euro symbol is in there as well.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页