What direction should I go in(libraries, documents)?
UPDATE
Can someone illustrate how to use winpcap to do the job?
You may want to look at the source code of tcpdump
to see how it works. tcpdump
is a Linux command-line utility that monitors and prints network activity. You need root access to the machine to use it, though.
try http://www.winpcap.org/
It may sound like overkill but the Web proxy/cache server Squid does exactly that. A few years ago my company used it and I had to tweak the code locally to provide some special warnings when certain URLs were accessed so I know it can do what you want. You just need to find the code you want and pull it out for your project. I used version 2.X and I see they're up to 3.X now but I suspect that aspect of the code hasn't changed much internally.
You didn't say if windows is a 'requirement' or a 'preference' but according to the site: http://www.squid-cache.org/ they can do both.
If by "hijack" you meant sniff the packets then what you should do to do it with WinPcap is the following:
Find the device you want to use - See WinPcap tutorial.
Open a device using pcap_open
// Open the device
char errorBuffer[PCAP_ERRBUF_SIZE];
pcap_t *pcapDescriptor = pcap_open(source, // name of the device
snapshotLength, // portion of the packet to capture
// 65536 guarantees that the whole packet will be captured on all the link layers
attributes, // 0 for no flags, 1 for promiscuous
readTimeout, // read timeout
NULL, // authentication on the remote machine
errorBuffer); // error buffer
Use a function that reads packets from the descriptor like pcap_loop
int result = pcap_loop(pcapDescriptor, count, functionPointer, NULL);
This will loop until something wrong has happened or the loop was broken using a special method call. It will call the functionPointer for each packet.
In the function pointed implement something that parses the packets, it should look like a pcap_handler
:
typedef void (*pcap_handler)(u_char *, const struct pcap_pkthdr *,
const u_char *);
Now all you have left is to parse the packets that their buffer is in the const u_char*
and their length is in the pcap_pkthdr
structure caplen
field.
Assuming you have HTTP GET over TCP over IPv4 over Ethernet packets, you can:
The rest of the packet should be the HTTP text. The text between the first and second space should be the URI. If it's too long you might need to do some TCP reconstruction, but most URIs are small enough to fit in one packet.
UPDATE: In code this would look like that (I wrote it without testing it):
int tcp_len, url_length;
uchar *url, *end_url, *final_url, *tcp_payload;
... /* code in http://www.winpcap.org/docs/docs_40_2/html/group__wpcap__tut6.html */
/* retireve the position of the tcp header */
ip_len = (ih->ver_ihl & 0xf) * 4;
/* retireve the position of the tcp payload */
tcp_len = (((uchar*)ih)[ip_len + 12] >> 4) * 4;
tcpPayload = (uchar*)ih + ip_len + tcp_len;
/* start of url - skip "GET " */
url = tcpPayload + 4;
/* length of url - lookfor space */
end_url = strchr((char*)url, ' ');
url_length = end_url - url;
/* copy the url to a null terminated c string */
final_url = (uchar*)malloc(url_length + 1);
strncpy((char*)final_url, (char*)url, url_length);
final_url[url_length] = '\0';
You can also filter only HTTP traffic by using creating and setting a BPF. See WinPcap tutorial. You should probably use the filter "tcp and dst port 80"
which would only give you the request your computer sends to the server.
If you don't mind using C#, you can try using Pcap.Net, which would do all that for you much more easily, including the parsing of Ethernet, IPv4 and TCP parts of the packet.