How to deal with accent problems using HTMLAgilityPack

给你一囗甜甜゛ 提交于 2019-12-13 21:13:21

问题


I'm try to extract the text of a html file, but inside of tag appears the following text:

<h3>Café<h3>

and when extract the text using the following code :

htmlDocument.DocumentNode.SelectSingleNode("some XPath").InnerText;

I get this string "Cafédirect" . How could fix this ?


回答1:


I've answered this here, basically you can ask HtmlAgilityPack to detect encoding of the HTML document.

HTMLAgilityPack Asp.net C# Error Handling




回答2:


I know the answer now, working I detect the way to do , here go :

htmlDocument.OptionDefaultStreamEncoding = Encoding.UTF8;

By default the encoding is System.Text.Encoding.Default with UTF-8 the accents are permitted



来源:https://stackoverflow.com/questions/18308059/how-to-deal-with-accent-problems-using-htmlagilitypack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!