Encoding differences between using WebClient and WebRequest?

情到浓时终转凉″ 提交于 2019-12-10 16:50:00

问题


In getting some random spanish newspaper's index I don't get the diacriticals properly using WebRequest, they yield this weird character: , while downloading the response from the same uri using a WebClient I get the appropriate response.

Why is this differentiation?

var client = new WebClient();
string html = client.DownloadString(endpoint);

vs

WebRequest request = WebRequest.Create(endpoint);
using (WebResponse response = request.GetResponse())
{
    Stream stream = response.GetResponseStream();
    StreamReader reader = new StreamReader(stream);
    string html = reader.ReadToEnd();
}

回答1:


You're just assuming that the entity is in UTF-8 when creating your stream-reader without explicitly setting the encoding. You should examine the CharacterSet of the HttpWebResponse (not exposed by the WebResponse base class), and open the StreamReader with the appropriate encoding.

Otherwise, if it reads something that's not UTF-8 as if it was UTF-8, it'll come across octet-sequences that aren't valid in UTF-8 and have to substitute in U+FFFD replacement character () as the best it can do.

WebClient does pretty much this: DownloadString is a higher level method, that where WebRequest and its derived classes let you get in lower, it has a single call for "send a GET request to the URI, examine the headers to see what content-encoding is in use, in case you need to un-gzip or de-compress it, see what character-encoding is in place, set up a text-reader with that encoding and the stream, and then call ReadAll()". The normal high-level-big-chunk-instructions vs low-level-small-chunk-instructions pros and cons apply.



来源:https://stackoverflow.com/questions/9019773/encoding-differences-between-using-webclient-and-webrequest

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!