java.util.zip - ZipInputStream v.s. ZipFile

前端 未结 3 1533
情书的邮戳
情书的邮戳 2021-01-05 05:42

I have some general questions regarding the java.util.zip library. What we basically do is an import and an export of many small components. Previously these c

相关标签:
3条回答
  • 2021-01-05 06:23

    I measured that just listing the files with ZipInputStream is 8 times slower than with ZipFile.

        long t = System.nanoTime();
        ZipFile zip = new ZipFile(jarFile);
        Enumeration<? extends ZipEntry> entries = zip.entries();
        while (entries.hasMoreElements())
        {
            ZipEntry entry = entries.nextElement();
    
            String filename = entry.getName();
            if (!filename.startsWith(JAR_TEXTURE_PATH))
                continue;
    
            textureFiles.add(filename);
        }
        zip.close();
        System.out.println((System.nanoTime() - t) / 1e9);
    

    and

        long t = System.nanoTime();
        ZipInputStream zip = new ZipInputStream(new FileInputStream(jarFile));
        ZipEntry entry;
        while ((entry = zip.getNextEntry()) != null)
        {
            String filename = entry.getName();
            if (!filename.startsWith(JAR_TEXTURE_PATH))
                continue;
    
            textureFiles.add(filename);
        }
        zip.close();
        System.out.println((System.nanoTime() - t) / 1e9);
    

    (Don't run them in the same class. Make two different classes and run them separately)

    0 讨论(0)
  • 2021-01-05 06:35

    Regarding Q3, experience in JENKINS-14362 suggests that zlib is not thread-safe even when operating on unrelated streams, i.e. that it has some improperly shared static state. Not proven, just a warning.

    0 讨论(0)
  • 2021-01-05 06:46

    Q1: yes, order will be the same in which entries were added.

    Q2: note that due to structure of zip archive files, and compression, none of solutions is exactly streaming; they all do some level of buffering. And if you check out JDK sources, implementations share most code. There is no real random access to within content, although index does allow finding chunks that correspond to entries. So I think there should not be meaningful performance differences; especially as OS will do caching of disk blocks anyway. You may want to just test performance to verify this with a simple test case.

    Q3: I would not count on this; and most likely they aren't. If you really think concurrent access would help (mostly because decompression is CPU bound, so it might help), I'd try reading the whole file in memory, expose via ByteArrayInputStream, and construct multiple independent readers.

    0 讨论(0)
提交回复
热议问题