Iterable gzip deflate/inflate in Java

前端 未结 3 1310
时光说笑
时光说笑 2020-12-21 05:07

Is there a library for gzip-deflating in terms of ByteBuffers hidden in the Internet? Something which allows us to push raw data then pull deflated data? We have searched fo

相关标签:
3条回答
  • 2020-12-21 05:21

    Much credit to Mark Adler for suggesting this approach, which is much better than my original answer.

    package stack;
    
    import java.io.*;
    import java.nio.ByteBuffer;
    import java.nio.channels.FileChannel;
    import java.util.zip.CRC32;
    import java.util.zip.Deflater;
    
    public class BufferDeflate2 {
        /** The standard 10 byte GZIP header */
        private static final byte[] GZIP_HEADER = new byte[] { 0x1f, (byte) 0x8b,
                Deflater.DEFLATED, 0, 0, 0, 0, 0, 0, 0 };
    
        /** CRC-32 of uncompressed data. */
        private final CRC32 crc = new CRC32();
    
        /** Deflater to deflate data */
        private final Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION,
                true);
    
        /** Output buffer building area */
        private final ByteArrayOutputStream buffer = new ByteArrayOutputStream();
    
        /** Internal transfer space */
        private final byte[] transfer = new byte[1000];
    
        /** The flush mode to use at the end of each buffer */
        private final int flushMode;
    
    
        /**
         * New buffer deflater
         * 
         * @param syncFlush
         *            if true, all data in buffer can be immediately decompressed
         *            from output buffer
         */
        public BufferDeflate2(boolean syncFlush) {
            flushMode = syncFlush ? Deflater.SYNC_FLUSH : Deflater.NO_FLUSH;
            buffer.write(GZIP_HEADER, 0, GZIP_HEADER.length);
        }
    
    
        /**
         * Deflate the buffer
         * 
         * @param in
         *            the buffer to deflate
         * @return deflated representation of the buffer
         */
        public ByteBuffer deflate(ByteBuffer in) {
            // convert buffer to bytes
            byte[] inBytes;
            int off = in.position();
            int len = in.remaining();
            if( in.hasArray() ) {
                inBytes = in.array();
            } else {
                off = 0;
                inBytes = new byte[len];
                in.get(inBytes);
            }
    
            // update CRC and deflater
            crc.update(inBytes, off, len);
            deflater.setInput(inBytes, off, len);
    
            while( !deflater.needsInput() ) {
                int r = deflater.deflate(transfer, 0, transfer.length, flushMode);
                buffer.write(transfer, 0, r);
            }
    
            byte[] outBytes = buffer.toByteArray();
            buffer.reset();
            return ByteBuffer.wrap(outBytes);
        }
    
    
        /**
         * Write the final buffer. This writes any remaining compressed data and the GZIP trailer.
         * @return the final buffer
         */
        public ByteBuffer doFinal() {
            // finish deflating
            deflater.finish();
    
            // write all remaining data
            int r;
            do {
                r = deflater.deflate(transfer, 0, transfer.length,
                        Deflater.FULL_FLUSH);
                buffer.write(transfer, 0, r);
            } while( r == transfer.length );
    
            // write GZIP trailer
            writeInt((int) crc.getValue());
            writeInt((int) deflater.getBytesRead());
    
            // reset deflater
            deflater.reset();
    
            // final output
            byte[] outBytes = buffer.toByteArray();
            buffer.reset();
            return ByteBuffer.wrap(outBytes);
        }
    
    
        /**
         * Write a 32 bit value in little-endian order
         * 
         * @param v
         *            the value to write
         */
        private void writeInt(int v) {
            System.out.println("v="+v);
            buffer.write(v & 0xff);
            buffer.write((v >> 8) & 0xff);
            buffer.write((v >> 16) & 0xff);
            buffer.write((v >> 24) & 0xff);
        }
    
    
        /**
         * For testing. Pass in the name of a file to GZIP compress
         * @param args
         * @throws IOException
         */
        public static void main(String[] args) throws IOException {
            File inFile = new File(args[0]);
            File outFile = new File(args[0]+".test.gz");
            FileChannel inChan = (new FileInputStream(inFile)).getChannel();
            FileChannel outChan = (new FileOutputStream(outFile)).getChannel();
    
            BufferDeflate2 def = new BufferDeflate2(false);
    
            ByteBuffer buf = ByteBuffer.allocate(500);
            while( true ) {
                buf.clear();
                int r = inChan.read(buf);
                if( r==-1 ) break;
                buf.flip();
                ByteBuffer compBuf = def.deflate(buf);
                outChan.write(compBuf);
            }
    
            ByteBuffer compBuf = def.doFinal();
            outChan.write(compBuf);
    
            inChan.close();
            outChan.close();
        }
    }
    
    0 讨论(0)
  • 2020-12-21 05:33

    Processing ByteBuffers is not hard. See my sample code below. You need to know how the buffers are created. The options are:

    1. Each buffer is compressed independently. This is so simple to handle I assume this is not the case. You would just transform the buffer into a byte array and wrap it in an ByteArrayInputStream within a GZIPInputStream.
    2. Each buffer was ended with a SYNC_FLUSH by the writer, and thus comprises an entire block of data within a stream. All the data written by the writer to the buffer can be read immediately by the reader.
    3. Each buffer is just part of a GZIP stream. There is no guarantee the reader can read anything from the buffer.

    Data generated by GZIP must be processed in order. The ByteBuffers will have to be processed in the same order they are generated.

    Sample code:

    package stack;
    
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.OutputStream;
    import java.nio.ByteBuffer;
    import java.nio.channels.Channels;
    import java.nio.channels.Pipe;
    import java.nio.channels.SelectableChannel;
    import java.util.concurrent.BlockingQueue;
    import java.util.concurrent.LinkedBlockingQueue;
    import java.util.concurrent.atomic.AtomicInteger;
    import java.util.zip.GZIPInputStream;
    
    public class BufferDeflate {
    
        static AtomicInteger idSrc = new AtomicInteger(1);
    
        /** Queue for transferring buffers */
        final BlockingQueue<ByteBuffer> buffers = new LinkedBlockingQueue<ByteBuffer>();
    
        /** The entry point for deflated buffers */
        final Pipe.SinkChannel bufSink;
    
        /** The source for the inflater */
        final Pipe.SourceChannel infSource;
    
        /** The destination for the inflater */
        final Pipe.SinkChannel infSink;
    
        /** The source for the outside world */
        public final SelectableChannel source;
    
    
    
        class Relayer extends Thread {
            public Relayer(int id) {
                super("BufferRelayer" + id);
            }
    
    
            public void run() {
                try {
                    while( true ) {
                        ByteBuffer buf = buffers.take();
                        if( buf != null ) {
                            bufSink.write(buf);
                        } else {
                            bufSink.close();
                            break;
                        }
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
    
    
    
        class Inflater extends Thread {
            public Inflater(int id) {
                super("BufferInflater" + id);
            }
    
    
            public void run() {
                try {
                    InputStream in = Channels.newInputStream(infSource);
                    GZIPInputStream gzip = new GZIPInputStream(in);
                    OutputStream out = Channels.newOutputStream(infSink);
    
                    int ch;
                    while( (ch = gzip.read()) != -1 ) {
                        out.write(ch);
                    }
                    out.close();
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
    
    
        /**
         * New buffer inflater
         */
        public BufferDeflate() throws IOException {
            Pipe pipe = Pipe.open();
            bufSink = pipe.sink();
            infSource = pipe.source();
    
            pipe = Pipe.open();
            infSink = pipe.sink();
            source = pipe.source().configureBlocking(false);
    
            int id = idSrc.incrementAndGet();
    
            Thread thread = new Relayer(id);
            thread.setDaemon(true);
            thread.start();
    
            thread = new Inflater(id);
            thread.setDaemon(true);
            thread.start();
        }
    
    
        /**
         * Add the buffer to the stream. A null buffer closes the stream
         * 
         * @param buf
         *            the buffer to add
         * @throws IOException
         */
        public void add(ByteBuffer buf) throws IOException {
            buffers.offer(buf);
        }
    }
    

    Simply pass the buffers to the add method and read from the public source channel. The amount of data that can be read from GZIP after processing a given number of bytes is impossible to predict. I have therefore made the source channel non-blocking so you can safely read from it in the same thread that you add the byte buffers.

    0 讨论(0)
  • 2020-12-21 05:44

    I don't understand the "hidden in the internet" part, but zlib does in-memory gzip format compression and decompression. The java.util.zip API provides some access to zlib, though it is limited. Due to the interface limitations, you cannot request that zlib produce and consume gzip streams directly. You can however use the nowrap option to produce and consume raw deflate data. Then it's easy to roll your own gzip header and trailer, using the CRC32 class in java.util.zip. You can prepend a fixed 10-byte header, append the four-byte CRC and then the four-byte uncompressed length (modulo 232), both in little-endian order, and you're good to go.

    0 讨论(0)
提交回复
热议问题