WebRequest “HEAD” light weight alternative

后端 未结 3 429
予麋鹿
予麋鹿 2020-12-06 13:21

I recently discovered that the following does not work with certain sites, such as IMDB.com.

class Program
    {
        static void Main(string[] args)
             


        
相关标签:
3条回答
  • 2020-12-06 13:29

    Open the connection yourself with a socket (instead of an HttpRequest or WebClient), and close the stream as soon as you've read the status code. Fortunately the status code comes near the top of the response stream :)

    0 讨论(0)
  • 2020-12-06 13:39

    You'll have to clarify what you mean by "lightweight". What are you trying to accomplish?

    Whether or not you can use GET/POST/HEAD/DELETE/etc will depend on the URL and what's configured in the application that is running on the server at that URL.

    If all you're trying to do is see if you can make a connection without actually downloading the content you could maybe try just initiating a connection to port 80 using sockets, but there isn't really reliable or universally supported way just by changing the HTTP method.

    0 讨论(0)
  • 2020-12-06 13:46

    If HEAD returns a 405, that means the server doesn't support HEAD (at least for that URL) and you'll have fall back to GET instead. The majority of sites should support HEAD, so you probably want to do HEAD by default, but if it throws a 405, you could maybe fall back to GET for that domain. Or maybe you want to try HEAD first for each request; YMMV.

    If the server requires GET and you want to reduce network traffic, you could try doing a conditional GET and/or a partial GET (see e.g. RFC2616). I've never tried doing those with WebRequest but I think it lets you add custom outgoing HTTP headers, so you should be able to do it.

    Also, don't forget that, if you're writing a spider (which you clearly are), you should respect the server's robots.txt, and it's also courteous to throttle your requests to something like one request every two seconds, so you don't slashdot the server.

    0 讨论(0)
提交回复
热议问题