Download HTTP thru sockets (C)

后端 未结 3 1574
悲哀的现实
悲哀的现实 2021-02-06 08:06

Recently I started taking this guide to get myself started on downloading files from the internet. I read it and came up with the following code to download the HTTP body of a w

3条回答
  •  一整个雨季
    2021-02-06 09:00

    If you want to grab files using HTTP, then libcURL is probably your best bet in C. However, if you are using this as a way to learn network programming, then you are going to have to learn a bit more about HTTP before you can retrieve a file.

    What you are seeing in your current program is that you need to send an explicit request for the file before you can retrieve it. I would start by reading through RFC2616. Don't try to understand it all - it is a lot to read for this example. Read the first section to get an understanding of how HTTP works, then read sections 4, 5, and 6 to understand the basic message format.

    Here is an example of what an HTTP request for the stackoverflow Questions page looks like:

    GET http://stackoverflow.com/questions HTTP/1.1\r\n
    Host: stackoverflow.com:80\r\n
    Connection: close\r\n
    Accept-Encoding: identity, *;q=0\r\n
    \r\n
    

    I believe that is a minimal request. I added the CRLFs explicitly to show that a blank line is used to terminate the request header block as described in RFC2616. If you leave out the Accept-Encoding header, then the result document will probably be transfered as a gzip-compressed stream since HTTP allows for this explicitly unless you tell the server that you do not want it.

    The server response also contains HTTP headers for the meta-data describing the response. Here is an example of a response from the previous request:

    HTTP/1.1 200 OK\r\n
    Server: nginx\r\n
    Date: Sun, 01 Aug 2010 13:54:56 GMT\r\n
    Content-Type: text/html; charset=utf-8\r\n
    Connection: close\r\n
    Cache-Control: private\r\n
    Content-Length: 49731\r\n
    \r\n
    \r\n
    \r\n
    

    This simple example should give you an idea what you are getting into implementing if you want to grab files using HTTP. This is the best case, most simple example. This isn't something that I would undertake lightly, but it is probably the best way to learn and appreciate HTTP.

    If you are looking for a simple way to learn network programming, this is a decent way to start. I would recommend picking up a copy of TCP/IP Illustrated, Volume 1 and UNIX Network Programming, Volume 1. These are probably the best way to really learn how to write network-based applications. I would probably start by writing an FTP client since FTP is a much simpler protocol to start with.

    If you are trying to learn the details associated with HTTP, then:

    1. Buy HTTP: the Definitive Guide and read it
    2. Read RFC2616 until you understand it
      • Try examples using telnet server 80 and typing in requests by hand
      • Download the cURL client and use the --verbose and --include command line options so that you can see what is happening
    3. Read Fielding's dissertation until HTTP really makes sense.

    Just don't plan on writing your own HTTP client for enterprise use. You do not want to do that, trust me as one who has been maintaining such a mistake for a little while now...

提交回复
热议问题