HTTP packet reconstruction

懵懂的女人 提交于 2019-11-27 21:37:34

OK I worked out how to do this (dodgy but it gets the job done).

It is simple to strip away the Ethernet, IP, and TCP headers leaving you with the 'raw' data message. Looking inside the message, it is easy to detect whether it is the start of a HTTP packet by looking for the "HTTP/1.1 ..." at the start of the packet. This indicates the packet is the start of a HTTP stream/larger packet/whatever. You can also do some simple parsing to read the "Content-Length" field which is the total length of the entire HTTP packet.

You can also use the Source/Destination IP & Port numbers to form a unique ID for the link. So after receiving the header packet, take note of these 4 things (SRCIP, SRCPORT, DESTIP, DESTPORT). Next time you receive a packet matching this port/ip combo, you can check whether it's the next part of the HTTP packet. You can use the sequence numbers to do some validation and probably other stuff, but generally the packets are in order so it's OK. I think a new port is opened for each HTTP stream so you shouldn't receive random packets that aren't part of the stream, but this could be an area prone for error.

Anyway, once you received this packet, once again strip away the headers and get the raw message. Add it onto the already known part of the message. If the length of the total message received so far is equal to the length read from "Content-Length" field, the packet is complete!

This method is obviously prone to a huge amount of errors, but I am not after an extremely robust way of doing it. I thought I would answer my own question in case someone else comes across this same issue in the future! Good luck with your sniffing :D

You should not be using any information from the TCP level to determine HTTP request boundaries. TCP provides a reliable byte stream service; you can't see any fields or flags in TCP that help with this because they are not there.

To determine where the boundaries are in an HTTP request you should follow RFC 2616. The boundaries are well-defined, and you can determine them by parsing the data you receive.

In each TCP packet, the start of the payload data is immediately after the TCP header, and the end of the payload data is the end of the IP packet.

The end of the TCP header is easily found - the Data Offset is a 4-bit field in the header that contains the length of the header in 32-bit words (so multiply it by 4 to get the length in 8-bit bytes).

Use the TCP sequence numbers from the Sequence field to string the payloads together in the right order. Note that there might be duplicates, in the case of retransmissions.

TCP is a stream protocol, not a packet protocol. The application layer (i.e. you) gets a stream of data, not a bunch of packets. You just keep reading bytes in from the stream and you'll get your entire http payload, while TCP does the error checking, resends, etc underneath.

You can use code of the open source project named Xplico: http://www.xplico.org

We had to work on solving the same problem. We were able extract some of the core functionality out in an open source project.

http://code.google.com/p/pcap-reconst/

Please do check it out and let me know if it help you out.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!