Gets the uncompressed size of this GZIPInputStream?

时光总嘲笑我的痴心妄想 提交于 2019-11-30 18:45:16

Is there a similiar method like ZipEntry.getSize() for GZIPInputStream

No. It's not in the Javadoc => it doesn't exist.

What do you need the length for?

Alexander Gräf

It is possible to determine the uncompressed size by reading the last four bytes of the gzipped file.

I found this solution here:

http://www.abeel.be/content/determine-uncompressed-size-gzip-file

Also from this link there is some example code (corrected to use long instead of int, to cope with sizes between 2GB and 4GB which would make an int wrap around):

RandomAccessFile raf = new RandomAccessFile(file, "r");
raf.seek(raf.length() - 4);
byte b4 = raf.read();
byte b3 = raf.read();
byte b2 = raf.read();
byte b1 = raf.read();
long val = ((long)b1 << 24) | ((long)b2 << 16) | ((long)b3 << 8) | (long)b4;
raf.close();

val is the length in bytes. Beware: you can not determine the correct uncompressed size, when the uncompressed file was greater than 4GB!

Based on @Alexander's answer:

RandomAccessFile raf = new RandomAccessFile(inputFilePath + ".gz", "r");
raf.seek(raf.length() - 4);
byte[] bytes = new byte[4];
raf.read(bytes);
fileSize = ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).getInt();
if (fileSize < 0)
  fileSize += (1L << 32);
raf.close();
Mark Adler

There is no reliable way to get the length other than decompressing the whole thing. See Uncompressed file size using zlib's gzip file access function .

If you can guess at the compression ratio (a reasonable expectation if the data is similar to other data you've already processed), then you can work out the size of arbitrarily large files (with some error). Again, this assumes a file containing a single gzip stream. The following assumes the first size greater than 90% of the estimated size (based on estimated ratio) is the true size:

estCompRatio = 6.1;
RandomAccessFile raf = new RandomAccessFile(inputFilePath + ".gz", "r");
compLength = raf.length();
byte[] bytes = new byte[4];
raf.read(bytes);
uncLength = ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN).getInt();
raf.seek(compLength - 4);
uncLength = raf.readInt();
while(uncLength < (compLength * estCompRatio * 0.9)){
  uncLength += (1L << 32);
}

[setting estCompRatio to 0 is equivalent to @Alexander's answer]

No, unfortunately if you wanted to get the uncompressed size, you would have to read the entire stream and increment a counter like you mention in your question. Why do you need to know the size? Could an estimation of the size work for your purposes?

A more compact version of the calculation based on the 4 tail bytes (avoids using a byte buffer, calls Integer.reverseBytes to reverse the byte order of read bytes).

private static long getUncompressedSize(Path inputPath) throws IOException
{
    long size = -1;
    try (RandomAccessFile fp = new RandomAccessFile(inputPath.toFile(), "r")) {        
        fp.seek(fp.length() - Integer.BYTES);
        int n = fp.readInt();
        size = Integer.toUnsignedLong(Integer.reverseBytes(n));
    }
    return size;
}

Get the FileChannel from the underlying FileInputStream instead. It tells you both file size and current position of the compressed file. Example:

@Override
public void produce(final DataConsumer consumer, final boolean skipData) throws IOException {
    try (FileInputStream fis = new FileInputStream(tarFile)) {
        FileChannel channel = fis.getChannel();
        final Eta<Long> eta = new Eta<>(channel.size());
        try (InputStream is = tarFile.getName().toLowerCase().endsWith("gz")
            ? new GZIPInputStream(fis) : fis) {
            try (TarArchiveInputStream tais = (TarArchiveInputStream) new ArchiveStreamFactory()
                .createArchiveInputStream("tar", new BufferedInputStream(is))) {

                TarArchiveEntry tae;
                boolean done = false;
                while (!done && (tae = tais.getNextTarEntry()) != null) {
                    if (tae.getName().startsWith("docs/") && tae.getName().endsWith(".html")) {
                        String data = null;
                        if (!skipData) {
                            data = new String(tais.readNBytes((int) tae.getSize()), StandardCharsets.UTF_8);
                        }
                        done = !consumer.consume(data);
                    }

                    String progress = eta.toStringPeriodical(channel.position());
                    if (progress != null) {
                        System.out.println(progress);
                    }
                }
                System.out.println("tar bytes read: " + tais.getBytesRead());
            } catch (ArchiveException ex) {
                throw new IOException(ex);
            }
        }
    }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!