i get page in utf-8 with russian language using curl. if i echo text it show good. then i use such code
$dom = new domDocument;
/*** load the html
I suggest use mb_convert_encoding before load UTF-8 page.
$dom = new DomDocument(); $html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"); @$dom->loadHTML($html);
OR else you could try this
$dom = new DomDocument('1.0', 'UTF-8'); @$dom->loadHTML($html); $dom->preserveWhiteSpace = false; .......... echo html_entity_decode($cols->item(2)->nodeValue,ENT_QUOTES,"UTF-8"); ..........
The DOM cannot recognize the HTML's encoding. You can try something like:
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="UTF-8">' . $html);
// taken from http://php.net/manual/en/domdocument.loadhtml.php#95251
mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8");
The same thing worked for PHPQuery.
P.S. I use phpQuery::newDocument($html);
instead of $dom->loadHTML($html);