PHP DOMDocument loadHTML not encoding UTF-8 correctly

后端 未结 13 1525
梦如初夏
梦如初夏 2020-11-22 15:11

I\'m trying to parse some HTML using DOMDocument, but when I do, I suddenly lose my encoding (at least that is how it appears to me).

$profile = \"

        
相关标签:
13条回答
  • 2020-11-22 15:46

    Use it for correct result

    $dom = new DOMDocument();
    $dom->loadHTML('<meta http-equiv="Content-Type" content="text/html; charset=utf-8">' . $profile);
    echo $dom->saveHTML();
    echo $profile;
    

    This operation

    mb_convert_encoding($profile, 'HTML-ENTITIES', 'UTF-8');
    

    It is bad way, because special symbols like &lt ; , &gt ; can be in $profile, and they will not convert twice after mb_convert_encoding. It is the hole for XSS and incorrect HTML.

    0 讨论(0)
提交回复
热议问题