HttpResponseMessage.Content.Header ignoring charset setting in meta tag in html source

时光毁灭记忆、已成空白 提交于 2019-12-25 08:48:33

问题


I have just posted this question, which answer came right away. It, in turn, creates the following new question:

If my understanding is correct, the StreamContent object, from HttpResponseMessage, is created upon making an HTTP request via HttpClient.GetAsync. Its Header property, or part of it, will be set according to meta tags included in the HTML source file.

For instance, a meta tag can tell the response object with which charset encode the file's contents.

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

Running a request to a resource that contains such line will generate a HttpResponseMessage.Content.Header with this setting.

In the other question referenced at the top of this question, I mention about a response object being created without the correct encoding. Since the HTML source that generates such incompatible response does contain the setting that is responsible for creating responses properly encoded:

<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1255">

what is the reason that responses for that site are not being passed the charset setting included in the meta tag and thus being rendered in an incorrect charset?

Here's a pictorial description of the question: both sites contain the meta tag with charset setting, but one, for some reason, misses it...


Fiddler's header details for both requests:

Working one: (removed cookie header)

Request:

GET http://www.ynet.co.il/home/0,7340,L-8,00.html HTTP/1.1
Host: www.ynet.co.il
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
If-Modified-Since: Thu, 31 Mar 2016 10:04:39 GMT

Response:

HTTP/1.1 200 OK
vg_id: 1
X-me: 06
Content-Type: text/html; charset=UTF-8
Last-Modified: Thu, 31 Mar 2016 10:38:57 GMT
Accept-Ranges: bytes
VX-Cache: HIT
WAI: 01
V-TTL: 0
backend-cache-control: 
Content-Length: 410685
Vary: Accept-Encoding
Date: Thu, 31 Mar 2016 10:38:48 GMT
Connection: keep-alive

Problematic one:

Request:

GET http://winedepot.co.il/ HTTP/1.1
Host: winedepot.co.il
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=201832727.725995063.1458660502.1459413977.1459418530.8; __utmz=201832727.1458660502.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utmc=201832727; ASPSESSIONIDCQTRQCAQ=FEOHEBFCBGABBKOBAHOGKBGB
Connection: keep-alive

Response:

HTTP/1.1 200 OK
Cache-Control: private
Content-Length: 118225
Content-Type: text/html
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 31 Mar 2016 10:36:21 GMT

回答1:


As you can see from your Fiddler screenshots, the HttpResponseMessage.Content.Headers.ContentType will contain exactly what was specified in the Content-type header of the response.

The HttpResponseMessage will not parse the response HTML and search for any <meta /> tags.




回答2:


content type comes from the HTTP HEADER

https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

is part of the content and not part of the headers.

I suggest you to install the application Fiddler to better understand what those request actually do. set fiddler as your proxy and use the inspectors to see what is actually passed when you make http requests.

better explanation is far from the scope here



来源:https://stackoverflow.com/questions/36329642/httpresponsemessage-content-header-ignoring-charset-setting-in-meta-tag-in-html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!