As I described in a previous question, I have an assignment to write a proxy server. It partially works now, but I still have a problem with handling of gzipped information.
Store it in a byte array:
byte[] bufer = new byte[???];
A more detailed process:
\r\n\r\n
in the buffer. You can write a helper function for example static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle)
Edit:
You are not following these steps I suggested. inputReader.ready()
is a wrong way to detect the phases of the response. There is no guarantee that the header will be sent in a single burst.
I tried to write a schematics in code (except the arrayIndexOf) function.
InputStream is;
// Create a buffer large enough for the response header (and drop exception if it is bigger).
byte[] headEnd = {13, 10, 13, 10}; // \r \n \r \n
byte[] buffer = new byte[10 * 1024];
int length = 0;
// Read bytes to the buffer until you find `\r\n\r\n` in the buffer.
int bytes = 0;
int pos;
while ((pos = arrayIndexOf(buffer, 0, length, headEnd)) == -1 && (bytes = is.read(buffer, length, buffer.length() - length)) > -1) {
length += bytes;
// buffer is full but have not found end siganture
if (length == buffer.length())
throw new RuntimeException("Response header too long");
}
// pos contains the starting index of the end signature (\r\n\r\n) so we add 4 bytes
pos += 4;
// When you encounter the end of header, create a strinform the first *n* bytes
String header = new String(buffer, 0, pos);
System.out.println(header);
// Be prepared that the buffer will contain additional data after the header
// ... so we process it
System.out.write(buffer, pos, length - pos);
// process the rest until connection is closed
while (bytes = is.read(buffer, 0, bufer.length())) {
System.out.write(buffer, 0, bytes);
}
The arrayIndexOf
method could look something like this: (there are probably faster versions)
public static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle) {
for (int i=offset; i<offset+length-nedle.length(); i++) {
boolean match = false;
for (int j=0; j<needle.length(); j++) {
match = haystack[i + j] == needle[j];
if (!match)
break;
}
if (match)
return i;
}
return -1;
}
After reading the headers with BufferedReader
you'll need to detect if the Content-Encoding
header is set to gzip
. If it is, to read the body you'll have to switch to using the InputStream
and wrap it with a GZIPInputStream
to decode the body. The tricky part however is the fact that the BufferedReader
will have buffered past the headers into the body and the underlying InputStream
will be ahead of where you need it.
What you could do is wrap the initial InputStream
with a BufferedInputStream
and call mark()
on it before you begin processing the headers. When you're done processing the headers call reset()
. Then read that stream until you hit the empty line between headers and the body. Now wrap it with the GZIPInputStream
to process the body.
You basically need to parse the response headers as text, and the rest as binary. It's slightly tricky to do so, as you can't just create an InputStreamReader
around the stream - that will read more data than you want. You'll quite possibly need to read data into a byte array and then call Encoding.GetString
manually. Alternatively, if you've read data into a byte array already you could always create a ByteArrayInputStream
around that, then an InputStreamReader
on top... but you'll need to work out how far the headers go before you get to the body of the response, which you should keep as binary data.
Jersey — a high level web framework — may save your day. You don't have to manage gzip content, header, etc, yourself anymore.
The following code gets the image used for your example and save it to disk. Then it verifies the saved image is equal to the downloaded one:
import com.google.common.io.ByteStreams;
import com.google.common.io.Files;
import com.sun.jersey.api.client.Client;
import com.sun.jersey.api.client.ClientResponse;
@Test
public void test() throws IOException {
String filename = "ps_logo2.png";
String url = "http://www.google.com/images/logos/" + filename;
File file = new File(filename);
WebResource resource = Client.create().resource(url);
ClientResponse response = resource.get(ClientResponse.class);
InputStream stream = response.getEntityInputStream();
byte[] bytes = ByteStreams.toByteArray(stream);
Files.write(bytes, file);
assertArrayEquals(bytes, Files.toByteArray(file));
}
You will need two maven dependencies to run it:
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-client</artifactId>
<version>1.6</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>r08</version>
</dependency>
I had the same problem. I commented the line which adds the header accept gzip:
con.setRequestProperty("Accept-Encoding","gzip, deflate");
...and it worked!