PHP- HTML parsing :: How can be taken charset value of webpage with simple html dom parser?

和自甴很熟 提交于 2019-12-11 04:00:58

问题


PHP:: How can be taken charset value of webpage with simple html dom parser (utf-8, windows-255, etc..)?

remark: its have to be done with html dom parser http://simplehtmldom.sourceforge.net

Example1 webpage charset input:

<meta content="text/html; charset=utf-8" http-equiv="Content-Type">

result:utf-8



Example2 webpage charset input:

<meta content="text/html; charset=windows-255" http-equiv="Content-Type">

result:windows-255

Edit:

I try this (but its not works):

$html = file_get_html('http://www.google.com/');
$el=$html->find('meta[content]',0);
echo $el->charset; 

What should be change? (I know that $el->charset not working)

Thanks


回答1:


You'll have to match the string using a regular expression (I hope you have PCRE...).

$el=$html->find('meta[http-equiv=Content-Type]',0)
$fullvalue = $el->content;
preg_match('/charset=(.+)/', $fullvalue, $matches);
echo $matches[1];

Not very robust, but should work.




回答2:


$dd = new DOMDocument;
$dd->loadHTML($data);
foreach ($dd->getElementsByTagName("meta") as $m) {
    if (strtolower($m->getAttribute("http-equiv")) == "content-type") {
        $v = $m->getAttribute("content");
        if (preg_match("#.+?/.+?;\\s?charset\\s?=\\s?(.+)#i", $v, $m))
            echo $m[1];
    }
}

Note that the DOM extension implicitly converts all the data to UTF-8.




回答3:


Thanks for MvanGeest answer - I just fix a bit and its works perfect.

$html = file_get_html('http://www.google.com/');
$el=$html->find('meta[content]',0);
$fullvalue = $el->content;
preg_match('/charset=(.+)/', $fullvalue, $matches);
echo substr($matches[0], strlen("charset="));


来源:https://stackoverflow.com/questions/3356067/php-html-parsing-how-can-be-taken-charset-value-of-webpage-with-simple-html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!