Recently I started taking this guide to get myself started on downloading files from the internet. I read it and came up with the following code to download the HTTP body of a w
If you want to grab files using HTTP, then libcURL is probably your best bet in C. However, if you are using this as a way to learn network programming, then you are going to have to learn a bit more about HTTP before you can retrieve a file.
What you are seeing in your current program is that you need to send an explicit request for the file before you can retrieve it. I would start by reading through RFC2616. Don't try to understand it all - it is a lot to read for this example. Read the first section to get an understanding of how HTTP works, then read sections 4, 5, and 6 to understand the basic message format.
Here is an example of what an HTTP request for the stackoverflow Questions page looks like:
GET http://stackoverflow.com/questions HTTP/1.1\r\n
Host: stackoverflow.com:80\r\n
Connection: close\r\n
Accept-Encoding: identity, *;q=0\r\n
\r\n
I believe that is a minimal request. I added the CRLFs explicitly to show that a blank line is used to terminate the request header block as described in RFC2616. If you leave out the Accept-Encoding header, then the result document will probably be transfered as a gzip-compressed stream since HTTP allows for this explicitly unless you tell the server that you do not want it.
The server response also contains HTTP headers for the meta-data describing the response. Here is an example of a response from the previous request:
HTTP/1.1 200 OK\r\n
Server: nginx\r\n
Date: Sun, 01 Aug 2010 13:54:56 GMT\r\n
Content-Type: text/html; charset=utf-8\r\n
Connection: close\r\n
Cache-Control: private\r\n
Content-Length: 49731\r\n
\r\n
\r\n
\r\n
This simple example should give you an idea what you are getting into implementing if you want to grab files using HTTP. This is the best case, most simple example. This isn't something that I would undertake lightly, but it is probably the best way to learn and appreciate HTTP.
If you are looking for a simple way to learn network programming, this is a decent way to start. I would recommend picking up a copy of TCP/IP Illustrated, Volume 1 and UNIX Network Programming, Volume 1. These are probably the best way to really learn how to write network-based applications. I would probably start by writing an FTP client since FTP is a much simpler protocol to start with.
If you are trying to learn the details associated with HTTP, then:
telnet server 80
and typing in requests by hand--verbose
and --include
command line options so that you can see what is happeningJust don't plan on writing your own HTTP client for enterprise use. You do not want to do that, trust me as one who has been maintaining such a mistake for a little while now...