I working on HTTP Traffic Data set which is composed of complete POST and GET request Like given below. I have written code in java that has separated each of these request
Here is a General Http Request Parser For all Method types (GET, POST, etc.) for your convinience:
package util.dpi.capture;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.Hashtable;
/**
* Class for HTTP request parsing as defined by RFC 2612:
*
* Request = Request-Line ; Section 5.1 (( general-header ; Section 4.5 |
* request-header ; Section 5.3 | entity-header ) CRLF) ; Section 7.1 CRLF [
* message-body ] ; Section 4.3
*
* @author izelaya
*
*/
public class HttpRequestParser {
private String _requestLine;
private Hashtable<String, String> _requestHeaders;
private StringBuffer _messagetBody;
public HttpRequestParser() {
_requestHeaders = new Hashtable<String, String>();
_messagetBody = new StringBuffer();
}
/**
* Parse and HTTP request.
*
* @param request
* String holding http request.
* @throws IOException
* If an I/O error occurs reading the input stream.
* @throws HttpFormatException
* If HTTP Request is malformed
*/
public void parseRequest(String request) throws IOException, HttpFormatException {
BufferedReader reader = new BufferedReader(new StringReader(request));
setRequestLine(reader.readLine()); // Request-Line ; Section 5.1
String header = reader.readLine();
while (header.length() > 0) {
appendHeaderParameter(header);
header = reader.readLine();
}
String bodyLine = reader.readLine();
while (bodyLine != null) {
appendMessageBody(bodyLine);
bodyLine = reader.readLine();
}
}
/**
*
* 5.1 Request-Line The Request-Line begins with a method token, followed by
* the Request-URI and the protocol version, and ending with CRLF. The
* elements are separated by SP characters. No CR or LF is allowed except in
* the final CRLF sequence.
*
* @return String with Request-Line
*/
public String getRequestLine() {
return _requestLine;
}
private void setRequestLine(String requestLine) throws HttpFormatException {
if (requestLine == null || requestLine.length() == 0) {
throw new HttpFormatException("Invalid Request-Line: " + requestLine);
}
_requestLine = requestLine;
}
private void appendHeaderParameter(String header) throws HttpFormatException {
int idx = header.indexOf(":");
if (idx == -1) {
throw new HttpFormatException("Invalid Header Parameter: " + header);
}
_requestHeaders.put(header.substring(0, idx), header.substring(idx + 1, header.length()));
}
/**
* The message-body (if any) of an HTTP message is used to carry the
* entity-body associated with the request or response. The message-body
* differs from the entity-body only when a transfer-coding has been
* applied, as indicated by the Transfer-Encoding header field (section
* 14.41).
* @return String with message-body
*/
public String getMessageBody() {
return _messagetBody.toString();
}
private void appendMessageBody(String bodyLine) {
_messagetBody.append(bodyLine).append("\r\n");
}
/**
* For list of available headers refer to sections: 4.5, 5.3, 7.1 of RFC 2616
* @param headerName Name of header
* @return String with the value of the header or null if not found.
*/
public String getHeaderParam(String headerName){
return _requestHeaders.get(headerName);
}
}
I [am] working on [an] HTTP Traffic Data set which is composed of complete POST and GET request[s]
So you want to parse a file or list that contains multiple HTTP requests. What data do you want to extract? Anyway here is a Java HTTP parsing class, which can read the method, version and URI used in the request-line, and that reads all headers into a Hashtable.
You can use that one or write one yourself if you feel like reinventing the wheel. Take a look at the RFC to see what a request looks like in order to parse it correctly:
Request = Request-Line ; Section 5.1
*(( general-header ; Section 4.5
| request-header ; Section 5.3
| entity-header ) CRLF) ; Section 7.1
CRLF
[ message-body ] ; Section 4.3
If you just want to send the raw request as it is, it's very easy, just send the actual String using a TCP socket!
Something like this:
Socket socket = new Socket(host, port);
BufferedWriter out = new BufferedWriter(
new OutputStreamWriter(socket.getOutputStream(), "UTF8"));
for (String line : getContents(request)) {
System.out.println(line);
out.write(line + "\r\n");
}
out.write("\r\n");
out.flush();
See this blog post by JoeJag for the full code.
UPDATE
I started a project, RawHTTP to provide HTTP parsers for request, responses, headers etc... it turned out so good that it makes it quite easy to write HTTP servers and clients on top of it. Check it out if you're looking for something lowl level.