问题
I'm try to extract the text of a html file, but inside of tag appears the following text:
<h3>Café<h3>
and when extract the text using the following code :
htmlDocument.DocumentNode.SelectSingleNode("some XPath").InnerText;
I get this string "Cafédirect" . How could fix this ?
回答1:
I've answered this here, basically you can ask HtmlAgilityPack to detect encoding of the HTML document.
HTMLAgilityPack Asp.net C# Error Handling
回答2:
I know the answer now, working I detect the way to do , here go :
htmlDocument.OptionDefaultStreamEncoding = Encoding.UTF8;
By default the encoding is System.Text.Encoding.Default with UTF-8 the accents are permitted
来源:https://stackoverflow.com/questions/18308059/how-to-deal-with-accent-problems-using-htmlagilitypack