Compress an InputStream with gzip

前端未结

关注

 12  1362

迷失自我

I would like to compress an input stream in java using Gzip compression.

Let\'s say we have an input stream (1GB of data..) not compressed. I want as a result a comp

相关标签:

12条回答

时光取名叫无心

2020-12-29 06:23

public InputStream getCompressed( InputStream is ) throws IOException
{
    byte data[] = new byte[2048];
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    GzipOutputStream zos = new GzipOutputStream( bos );
    BufferedInputStream entryStream = new BufferedInputStream( is, 2048);
    int count;
    while ( ( count = entryStream.read( data, 0, 2048) ) != -1 )
    {
        zos.write( data, 0, count );
    }
    entryStream.close();
    zos.close();

    return new ByteArrayInputStream( bos.toByteArray() );
}

ref :zip compression

0 讨论(0)

遇见更好的自我

2020-12-29 06:25

Shouldn't you be looking at GZIPOutputStream in that case?

public OutputStream getCompressedStream(InputStream input) {
    OutputStream output = new GZIPOutputStream(new ByteArrayOutputStream()); 
    IOUtils.copy(input, output);
    return output;
}

0 讨论(0)

说谎

2020-12-29 06:29

It seems I was late for 3 years, but maybe will be useful for someone. My solution is similar to @Michael Wyraz's solution, only difference is that my solution is based on FilterInputStream

import java.io.ByteArrayInputStream;
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.zip.CRC32;
import java.util.zip.Deflater;

public class GZipInputStreamDeflater extends FilterInputStream {

    private static enum Stage {
        HEADER,
        DATA,
        FINALIZATION,
        TRAILER,
        FINISH
    }

    private GZipInputStreamDeflater.Stage stage = Stage.HEADER;

    private final Deflater deflater = new Deflater( Deflater.DEFLATED, true );
    private final CRC32 crc = new CRC32();

    /* GZIP header magic number */
    private final static int GZIP_MAGIC = 0x8b1f;

    private ByteArrayInputStream trailer = null;
    private ByteArrayInputStream header = new ByteArrayInputStream( new byte[] {
        (byte) GZIP_MAGIC, // Magic number (short)
        (byte) ( GZIP_MAGIC >> 8 ), // Magic number (short)
        Deflater.DEFLATED, // Compression method (CM)
        0, // Flags (FLG)
        0, // Modification time MTIME (int)
        0, // Modification time MTIME (int)
        0, // Modification time MTIME (int)
        0, // Modification time MTIME (int)
        0, // Extra flags (XFLG)
        0, // Operating system (OS)
    } );

    public GZipInputStreamDeflater(InputStream in) {
        super( in );
        crc.reset();
    }

    @Override
    public int read( byte[] b, int off, int len ) throws IOException {
        int read = -1;

        switch( stage ) {
            case FINISH:
                return -1;
            case HEADER:
                read = header.read( b, off, len );
                if( header.available() == 0 ) {
                    stage = Stage.DATA;
                }
                return read;
            case DATA:
                byte[] b2 = new byte[len];
                read = super.read( b2, 0, len );
                if( read <= 0 ) {
                    stage = Stage.FINALIZATION;
                    deflater.finish();
                    return 0;
                }
                else {
                    deflater.setInput( b2, 0, read );
                    crc.update( b2, 0, read );
                    read = 0;
                    while( !deflater.needsInput() && len - read > 0 ) {
                        read += deflater.deflate( b, off + read, len - read, Deflater.NO_FLUSH );
                    }
                    return read;
                }
            case FINALIZATION:
                if( deflater.finished() ) {
                    stage = Stage.TRAILER;

                    int crcVaue = (int) crc.getValue();
                    int totalIn = deflater.getTotalIn();

                    trailer = new ByteArrayInputStream( new byte[] {
                        (byte) ( crcVaue >> 0 ),
                        (byte) ( crcVaue >> 8 ),
                        (byte) ( crcVaue >> 16 ),
                        (byte) ( crcVaue >> 24 ),

                        (byte) ( totalIn >> 0 ),
                        (byte) ( totalIn >> 8 ),
                        (byte) ( totalIn >> 16 ),
                        (byte) ( totalIn >> 24 ),
                    } );

                    return 0;
                }
                else {
                    read = deflater.deflate( b, off, len, Deflater.FULL_FLUSH );
                    return read;
                }
            case TRAILER:
                read = trailer.read( b, off, len );
                if( trailer.available() == 0 ) {
                    stage = Stage.FINISH;
                }
                return read;
        }
        return -1;
    }

    @Override
    public void close( ) throws IOException {
        super.close();
        deflater.end();
        if( trailer != null ) {
            trailer.close();
        }
        header.close();
    }
}

Usage:

AmazonS3Client s3client = new AmazonS3Client( ... );
try ( InputStream in = new GZipInputStreamDeflater( new URL( "http://....../very-big-file.csv" ).openStream() ); ) {
    PutObjectRequest putRequest = new PutObjectRequest( "BUCKET-NAME", "/object/key", in, new ObjectMetadata() );
    s3client.putObject( putRequest );
}

0 讨论(0)

忘了有多久

2020-12-29 06:32
To compress data you need the GZIPOutputStream. But since you need to read the data back as if from an InputStream you need to convert the OutputStream to an InputStream. You can use getBytes() to do so:
```
GZIPOutputStream gout = new GZIPOutputStream(out);
//... Code to read from your original uncompressed data and write to out.

//Convert to InputStream.
new ByteArrayInputStream(gout.getBytes());
```
But this method has the limitation that you need to read in all data first - and that means you have to have enough memory to hold that buffer.

Alternative approaches using Pipes are mentioned in this thread - How to convert OutputStream to InputStream?
0 讨论(0)
发布评论:

提交评论
- 加载中...

时光说笑

2020-12-29 06:33

Here is a version that I wrote that doesn't have the CRC/GZIP Magic cookies in it, because it delegates to a GZIPOutputStream. It is also memory efficient in that it only uses enough memory to buffer the compression (a 42MB file used a 45k buffer). Performance is the same as compressing to memory.

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.zip.GZIPOutputStream;

/**
 * Compresses an InputStream in a memory-optimal, on-demand way only compressing enough to fill a buffer.
 * 
 * @author Ben La Monica
 */
public class GZIPCompressingInputStream extends InputStream {

    private InputStream in;
    private GZIPOutputStream gz;
    private OutputStream delegate;
    private byte[] buf = new byte[8192];
    private byte[] readBuf = new byte[8192];
    int read = 0;
    int write = 0;

    public GZIPCompressingInputStream(InputStream in) throws IOException {
        this.in = in;
        this.delegate = new OutputStream() {

            private void growBufferIfNeeded(int len) {
                if ((write + len) >= buf.length) {
                    // grow the array if we don't have enough space to fulfill the incoming data
                    byte[] newbuf = new byte[(buf.length + len) * 2];
                    System.arraycopy(buf, 0, newbuf, 0, buf.length);
                    buf = newbuf;
                }
            }

            @Override
            public void write(byte[] b, int off, int len) throws IOException {
                growBufferIfNeeded(len);
                System.arraycopy(b, off, buf, write, len);
                write += len;
            }

            @Override
            public void write(int b) throws IOException {
                growBufferIfNeeded(1);
                buf[write++] = (byte) b;
            }
        };
        this.gz = new GZIPOutputStream(delegate); 
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        compressStream();
        int numBytes = Math.min(len, write-read);
        if (numBytes > 0) {
            System.arraycopy(buf, read, b, off, numBytes);
            read += numBytes;
        } else if (len > 0) {
            // if bytes were requested, but we have none, then we're at the end of the stream
            return -1;
        }
        return numBytes;
    }

    private void compressStream() throws IOException {
        // if the reader has caught up with the writer, then zero the positions out
        if (read == write) {
            read = 0;
            write = 0;
        }

        while (write == 0) {
            // feed the gzip stream data until it spits out a block
            int val = in.read(readBuf);
            if (val == -1) {
                // nothing left to do, we've hit the end of the stream. finalize and break out
                gz.close();
                break;
            } else if (val > 0) {
                gz.write(readBuf, 0, val);
            }
        }
    }

    @Override
    public int read() throws IOException {
        compressStream();
        if (write == 0) {
            // write should not be 0 if we were able to get data from compress stream, must mean we're at the end
            return -1;
        } else {
            // reading a single byte
            return buf[read++] & 0xFF;
        }
    }
}

0 讨论(0)

太阳男子

2020-12-29 06:34

PipedOutputStream lets you write to a GZIPOutputStream and expose that data through an InputStream. It has a fixed memory cost, unlike other solutions which buffer the entire stream of data to an array or file. Only problem is you can't read and write from the same Thread, you have to use a separate one.

private InputStream gzipInputStream(InputStream in) throws IOException {
    PipedInputStream zipped = new PipedInputStream();
    PipedOutputStream pipe = new PipedOutputStream(zipped);
    new Thread(
            () -> {
                try(OutputStream zipper = new GZIPOutputStream(pipe)){
                    IOUtils.copy(in, zipper);
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
    ).start();
    return zipped;
}

0 讨论(0)

上一页 1 2