How to create compressed Zip archive using ZipOutputStream so that method getSize() of ZipEntry returns correct size?

后端 未结 2 2034
执念已碎
执念已碎 2021-02-14 19:23

Consider the code example that put a single file test_file.pdf into zip archive test.zip and then read this archive:

import java.io.*;
         


        
相关标签:
2条回答
  • 2021-02-14 19:41

    A simple and elegant workaround is to write the ZipEntry to a temporary ZipOutputStream first. This is what the updateEntry method of the following code does. When the method has been called, the ZipEntry knows the size, compressed size and CRC, without having to calculate them explicitly. When it is written to the target ZipOutputStream, it will correctly write the values.

    Original answer:


    dirty but fast

    public static void main(String[] args) throws IOException 
    {
        FileInputStream fis = new FileInputStream( "source.txt" );
        FileOutputStream fos = new FileOutputStream( "result.zip" );
        ZipOutputStream zos = new ZipOutputStream( fos );
    
        byte[] buf = new byte[fis.available()];
        fis.read(buf);
        ZipEntry e = new ZipEntry( "source.txt" );
    
        updateEntry(e, buf);
    
        zos.putNextEntry(e);
        zos.write(buf);
        zos.closeEntry();
    
        zos.close();
    }
    
    private static void updateEntry(ZipEntry entry, byte[] buffer) throws IOException
    {
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        ZipOutputStream zos = new ZipOutputStream( bos );
        zos.putNextEntry(entry);
        zos.write(buffer);
        zos.closeEntry();
        zos.close();
        bos.close();
    }
    
    0 讨论(0)
  • 2021-02-14 19:52

    You can only set the uncompressed size if you also set the CRC and compressed size as well. Since these information are stored before in a header before the actual data and ZipOutputStream can’t rewind arbitrary OutputStreams, it can’t calculate these values while writing and store them afterwards (but it will calculate them for verifying the provided values).

    Here is a solution for calculating the values in one pass before the writing. It utilizes the fact that you can rewind a stream if it is backed by a file.

    public static void main(String[] args) throws IOException {
        File infile  = new File("test_file.pdf");
        File outfile = new File("test.zip");
        try (FileInputStream  fis = new FileInputStream(infile);
             FileOutputStream fos = new FileOutputStream(outfile);
             ZipOutputStream  zos = new ZipOutputStream(fos) ) {
    
            byte[]  buffer = new byte[1024];
            ZipEntry entry = new ZipEntry("data");
            precalc(entry, fis.getChannel());
            zos.putNextEntry(entry);
            for(int bytesRead; (bytesRead = fis.read(buffer)) >= 0; )
                zos.write(buffer, 0, bytesRead);
            zos.closeEntry();
        }
    
        try(FileInputStream fin = new FileInputStream(outfile);
            ZipInputStream  zis = new ZipInputStream(fin) ) {
    
            ZipEntry entry = zis.getNextEntry();
            System.out.println("Entry size: " + entry.getSize());
            System.out.println("Compressed size: " + entry.getCompressedSize());
            System.out.println("CRC: " + entry.getCrc());
            zis.closeEntry();
        }
    }
    
    private static void precalc(ZipEntry entry, FileChannel fch) throws IOException {
        long uncompressed = fch.size();
        int method = entry.getMethod();
        CRC32 crc = new CRC32();
        Deflater def;
        byte[] drain;
        if(method != ZipEntry.STORED) {
            def   = new Deflater(Deflater.DEFAULT_COMPRESSION, true);
            drain = new byte[1024];
        }
        else {
            def   = null;
            drain = null;
        }
        ByteBuffer buf = ByteBuffer.allocate((int)Math.min(uncompressed, 4096));
        for(int bytesRead; (bytesRead = fch.read(buf)) != -1; buf.clear()) {
            crc.update(buf.array(), buf.arrayOffset(), bytesRead);
            if(def!=null) {
                def.setInput(buf.array(), buf.arrayOffset(), bytesRead);
                while(!def.needsInput()) def.deflate(drain, 0, drain.length);
            }
        }
        entry.setSize(uncompressed);
        if(def!=null) {
            def.finish();
            while(!def.finished()) def.deflate(drain, 0, drain.length);
            entry.setCompressedSize(def.getBytesWritten());
        }
        entry.setCrc(crc.getValue());
        fch.position(0);
    }
    

    It handles both, uncompressed and compressed entries, but unfortunately, only with the default compression level as ZipOutputStream has no method for querying the current level. So if you change the compression level you have to keep the precalc code in sync. Alternatively, you could move the logic into a subclass of ZipOutputStream and use the same Deflater so it will automatically have the same configuration.

    A solution working with arbitrary source input streams would require buffering of the entire entry data.

    0 讨论(0)
提交回复
热议问题