HttpWebRequest: Receiving response with the right encoding

后端 未结 3 877
被撕碎了的回忆
被撕碎了的回忆 2021-01-07 01:26

I\'m currently downloading an HTML page, using the following code:

Try
    Dim req As System.Net.HttpWebRequest = DirectCast(WebRequest.Create(URL), HttpWebR         


        
3条回答
  •  孤街浪徒
    2021-01-07 01:32

    Daniel, Some pages not even return a value in the CharacterSet, so this approach is not so reliable. Sometimes not even the browsers are able to "guess" which Encoding to use, so I think you can't do a 100% enconding recogniton.

    In my particular case, as I deal with spanish or portuguese pages, I use the UTF7 encoding and it is working fine for me (áéíóúñÑêã... etc).

    May be you can first load a table of CharacterSet codes and their corresponding Encoding. And in case the CharacterSet is empty, you can provide a Default encoding.

    The detectEncodingFromByteOrderMarks parameter in the StreamReader constructor, may help a little as it automatically detect or infers some encodings from the very first bytes.

提交回复
热议问题