HttpWebRequest: Receiving response with the right encoding

后端 未结 3 873
被撕碎了的回忆
被撕碎了的回忆 2021-01-07 01:26

I\'m currently downloading an HTML page, using the following code:

Try
    Dim req As System.Net.HttpWebRequest = DirectCast(WebRequest.Create(URL), HttpWebR         


        
3条回答
  •  北海茫月
    2021-01-07 01:27

    Gap's site is wrong. The specific problem is that their page claims an encoding of Latin1 (ISO-8859-1), while the page uses character #146 which is not valid in ISO-8859-1.

    That character is, however, valid in the Windows CP-1252 encoding (which is a superset of ISO 8859-1). In CP-1252, character code #146 and is used for the right-quote character. You'll see this as an apostrophe in "Youll find Petites and small sizes" in today's text on the Gap.com home page.

    You can read http://en.wikipedia.org/wiki/Windows-1252 for more details. Turns out this kind of thing is a common problem on web pages where the content was originally saved in the CP-1252 encoding (e.g. copy/pasted from Word).

    Moral of the story here: always store internationalized text as Unicode in your database, and always emit HTML as UTF8 on your web server!

提交回复
热议问题