PHP DomDocument - why is en dash “–” converted to –

元气小坏坏 提交于 2020-05-28 05:32:25

问题


I am using DOMDocument to extract some paragraphs.

Here is how my initial htm file that I am impotrting looks like:

<html>
    <head>
        <title>Toxins</title>
    </head>

    <body>
        <p class=8reference><span>1.</span><span>Sivonen, K.; Jones, G. Cyanobacterial Toxins. In <i>Toxic Cyanobacteria in Water. A Guide to Their Public Health Consequences, Monitoring and Management</i>; Chorus, I., Bartram, J., Eds.; E. and F.N. Spon: London, UK, 1999; pp. 41–111.</span></p>
    </body>
</html>

When I am doing:

$dom_input = new \DOMDocument("1.0","UTF-8");
$dom_input->encoding = "UTF-8";
$dom_input->formatOutput = true;
$dom_input->loadHTMLFile($manuscript->getUploadRootDir().$manuscript->getFileName());

$paragraphs = $dom_input->getElementsByTagName('p');

foreach ($paragraphs as $paragraph) {
    if($paragraph->getAttribute('class') == "8reference") {
        var_dump($paragraph->nodeValue);
    }
}

The dash from "pp. 41–111" is converted to

pp. 41–111

Any idea why and how can I fix it in order to get utf8 unicode values?

Thank you in advance.


回答1:


It looks to me like the data is correct, you're just displaying it incorrectly.

Are you outputting in UTF-8?

The à + thing is a classic "showing UTF-8 encoded data as if it was other than UTF-8.

E.g. If you're outputting to a web browser, try setting the character set with a meta tag. E.g.

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

If you need to output in something other than UTF-8 you'll need to convert into the alternative character set first.




回答2:


When using PHP fputcsv() to generate CSV file. Use this before inserting data to fputcsv()

$data = mb_convert_encoding($data, 'cp1252', 'utf-8');
fputcsv($file, $data);

This will surely stop conversion of dash to â€" when generating CSV.



来源:https://stackoverflow.com/questions/19959794/php-domdocument-why-is-en-dash-converted-to-%c3%a2%e2%82%ac

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!