发表新帖

发表新帖

How to convert any character encoding to UTF8 on PHP

后端未结

关注

 3  801

I\'m working on a web crawler that grabs data from sites all over the world, and is dealing with distinct languages and encodings.

Currently I\'m using the following

相关标签:

3条回答

说谎

2021-01-06 09:40
Rather than blindly trying to detect the encoding, you should first check if the page that you downloaded has a listed character set. The character set may be set in the HTTP response header, for example:
```
Content-Type:text/html; charset=utf-8
```
Or in the HTML as a meta tag, for example:
```
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 
```
Only if neither are available then try to guess the encoding with mb_detect_encoding() or other methods.
0 讨论(0)
发布评论:

提交评论
- 加载中...
迷失自我

2021-01-06 09:47
It's not possible to detect character set of a string in 100% rate since some character sets are subset of some others. Try setting character set explicitly if possible without mixing iconv and mbstring functions. I recommend using a function like this and supplying from charset whenever possible:
```
function convertEncoding($str, $from = 'auto', $to = "UTF-8") {
    if($from == 'auto') $from = mb_detect_encoding($str);
    return mb_convert_encoding ($str , $to, $from); 
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
说谎

2021-01-06 10:06
You can try utf_encode($str).

http://www.php.net/manual/en/function.utf8-encode.php#89789

Or you can replace the content type meta tag with
```
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 
```
from header of crawled content
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题