问题
In getting some random spanish newspaper's index I don't get the diacriticals properly using WebRequest, they yield this weird character: �
, while downloading the response from the same uri using a WebClient
I get the appropriate response.
Why is this differentiation?
var client = new WebClient();
string html = client.DownloadString(endpoint);
vs
WebRequest request = WebRequest.Create(endpoint);
using (WebResponse response = request.GetResponse())
{
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string html = reader.ReadToEnd();
}
回答1:
You're just assuming that the entity is in UTF-8 when creating your stream-reader without explicitly setting the encoding. You should examine the CharacterSet
of the HttpWebResponse
(not exposed by the WebResponse
base class), and open the StreamReader
with the appropriate encoding.
Otherwise, if it reads something that's not UTF-8 as if it was UTF-8, it'll come across octet-sequences that aren't valid in UTF-8 and have to substitute in U+FFFD replacement character (�
) as the best it can do.
WebClient does pretty much this: DownloadString
is a higher level method, that where WebRequest
and its derived classes let you get in lower, it has a single call for "send a GET request to the URI, examine the headers to see what content-encoding is in use, in case you need to un-gzip or de-compress it, see what character-encoding is in place, set up a text-reader with that encoding and the stream, and then call ReadAll()
". The normal high-level-big-chunk-instructions vs low-level-small-chunk-instructions pros and cons apply.
来源:https://stackoverflow.com/questions/9019773/encoding-differences-between-using-webclient-and-webrequest