How to store an Http Response that may contain binary data?

前端 未结 5 1713
情话喂你
情话喂你 2020-12-31 19:03

As I described in a previous question, I have an assignment to write a proxy server. It partially works now, but I still have a problem with handling of gzipped information.

相关标签:
5条回答
  • 2020-12-31 19:10

    Store it in a byte array:

    byte[] bufer = new byte[???];
    

    A more detailed process:

    • Create a buffer large enough for the response header (and drop exception if it is bigger).
    • Read bytes to the buffer until you find \r\n\r\n in the buffer. You can write a helper function for example static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle)
    • When you encounter the end of header, create a strinform the first n bytes of the buffer. You can then use RegEx on this strng (also note that RegEx is not the best method to parse HTTPeaders).
    • Be prepared that the buffer will contain additional data after the header, which are the first bytes of the response body. You have to copy these bytes to the output stream or output file or output buffer.
    • Read the rest of the response body. (Until content-length is read or stream is closed).

    Edit:

    You are not following these steps I suggested. inputReader.ready() is a wrong way to detect the phases of the response. There is no guarantee that the header will be sent in a single burst.

    I tried to write a schematics in code (except the arrayIndexOf) function.

    InputStream is;
    
    // Create a buffer large enough for the response header (and drop exception if it is bigger).
    byte[] headEnd = {13, 10, 13, 10}; // \r \n \r \n
    byte[] buffer = new byte[10 * 1024];
    int length = 0;
    
    // Read bytes to the buffer until you find `\r\n\r\n` in the buffer. 
    int bytes = 0;
    int pos;
    while ((pos = arrayIndexOf(buffer, 0, length, headEnd)) == -1 && (bytes = is.read(buffer, length, buffer.length() - length)) > -1) {
        length += bytes;
    
        // buffer is full but have not found end siganture
        if (length == buffer.length())
            throw new RuntimeException("Response header too long");
    }
    
    // pos contains the starting index of the end signature (\r\n\r\n) so we add 4 bytes
    pos += 4;
    
    // When you encounter the end of header, create a strinform the first *n* bytes
    String header = new String(buffer, 0, pos);
    
    System.out.println(header);
    
    // Be prepared that the buffer will contain additional data after the header
    // ... so we process it
    System.out.write(buffer, pos, length - pos);
    
    // process the rest until connection is closed
    while (bytes = is.read(buffer, 0, bufer.length())) {
        System.out.write(buffer, 0, bytes);
    }
    

    The arrayIndexOf method could look something like this: (there are probably faster versions)

    public static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle) {
        for (int i=offset; i<offset+length-nedle.length(); i++) {
            boolean match = false;
            for (int j=0; j<needle.length(); j++) {
                match = haystack[i + j] == needle[j];
                if (!match)
                    break;
            }
            if (match)
                return i;
        }
        return -1;
    }
    
    0 讨论(0)
  • 2020-12-31 19:10

    After reading the headers with BufferedReader you'll need to detect if the Content-Encoding header is set to gzip. If it is, to read the body you'll have to switch to using the InputStream and wrap it with a GZIPInputStream to decode the body. The tricky part however is the fact that the BufferedReader will have buffered past the headers into the body and the underlying InputStream will be ahead of where you need it.

    What you could do is wrap the initial InputStream with a BufferedInputStream and call mark() on it before you begin processing the headers. When you're done processing the headers call reset(). Then read that stream until you hit the empty line between headers and the body. Now wrap it with the GZIPInputStream to process the body.

    0 讨论(0)
  • 2020-12-31 19:30

    You basically need to parse the response headers as text, and the rest as binary. It's slightly tricky to do so, as you can't just create an InputStreamReader around the stream - that will read more data than you want. You'll quite possibly need to read data into a byte array and then call Encoding.GetString manually. Alternatively, if you've read data into a byte array already you could always create a ByteArrayInputStream around that, then an InputStreamReader on top... but you'll need to work out how far the headers go before you get to the body of the response, which you should keep as binary data.

    0 讨论(0)
  • 2020-12-31 19:30

    Jersey — a high level web framework — may save your day. You don't have to manage gzip content, header, etc, yourself anymore.

    The following code gets the image used for your example and save it to disk. Then it verifies the saved image is equal to the downloaded one:

    import com.google.common.io.ByteStreams;
    import com.google.common.io.Files;
    import com.sun.jersey.api.client.Client;
    import com.sun.jersey.api.client.ClientResponse;
    
    @Test
    public void test() throws IOException {
        String filename = "ps_logo2.png";
        String url = "http://www.google.com/images/logos/" + filename;
        File file = new File(filename);
    
        WebResource resource = Client.create().resource(url);
        ClientResponse response = resource.get(ClientResponse.class);
        InputStream stream = response.getEntityInputStream();
        byte[] bytes = ByteStreams.toByteArray(stream);
        Files.write(bytes, file);
    
        assertArrayEquals(bytes, Files.toByteArray(file));
    }
    

    You will need two maven dependencies to run it:

    <dependency>
        <groupId>com.sun.jersey</groupId>
        <artifactId>jersey-client</artifactId>
        <version>1.6</version>
    </dependency>
    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>r08</version>
    </dependency>
    
    0 讨论(0)
  • 2020-12-31 19:34

    I had the same problem. I commented the line which adds the header accept gzip:

    con.setRequestProperty("Accept-Encoding","gzip, deflate");
    

    ...and it worked!

    0 讨论(0)
提交回复
热议问题