问题
background
I stumbled across this problem here
analysis
according to the java docs for ZipEntry, sometimes requesting the size of a zipfile entry simply returns -1
However, running the command
$ unzip -l b17c024e-89f1-42f7-a546-91d46610cedb.epub
Archive: b17c024e-89f1-42f7-a546-91d46610cedb.epub
Length Date Time Name
-------- ---- ---- ----
20 01-27-12 11:17 mimetype
2378 04-20-12 10:12 OEBPS/hayat-ghayr.html
6436 02-06-12 11:06 OEBPS/content.opf
112579 01-27-12 11:25 OEBPS/images/978-614-425-313-7-hayat-ghayr-cover.png
182575 01-27-12 11:25 OEBPS/images/978-614-425-313-7-hayat_fmt.png
7757 01-27-12 11:21 OEBPS/template.css
5643 01-27-12 11:18 OEBPS/hayat-ghayr-2.html
20144 01-27-12 11:17 OEBPS/hayat-ghayr-1.html
65543 01-27-12 11:17 OEBPS/hayat-ghayr-3.html
59434 01-27-12 11:17 OEBPS/hayat-ghayr-4.html
66768 01-27-12 11:17 OEBPS/hayat-ghayr-5.html
49117 01-27-12 11:17 OEBPS/hayat-ghayr-6.html
65346 01-27-12 11:17 OEBPS/hayat-ghayr-7.html
74196 01-27-12 11:17 OEBPS/hayat-ghayr-8.html
73998 01-27-12 11:17 OEBPS/hayat-ghayr-9.html
61031 01-27-12 11:17 OEBPS/hayat-ghayr-10.html
68297 01-27-12 11:17 OEBPS/hayat-ghayr-11.html
72084 01-27-12 11:17 OEBPS/hayat-ghayr-12.html
2386 01-27-12 11:17 OEBPS/hayat-ghayr-13.html
61132 01-27-12 11:17 OEBPS/hayat-ghayr-14.html
46320 01-27-12 11:17 OEBPS/hayat-ghayr-15.html
32673 01-27-12 11:17 OEBPS/hayat-ghayr-16.html
88584 01-27-12 11:17 OEBPS/hayat-ghayr-17.html
56474 01-27-12 11:17 OEBPS/hayat-ghayr-18.html
52840 01-27-12 11:17 OEBPS/hayat-ghayr-19.html
80022 01-27-12 11:17 OEBPS/hayat-ghayr-20.html
50781 01-27-12 11:17 OEBPS/hayat-ghayr-21.html
2765 01-27-12 11:17 OEBPS/hayat-ghayr-22.html
265 01-27-12 11:17 META-INF/container.xml
54942 01-27-12 11:17 OEBPS/images/277.png
5549 01-27-12 11:17 OEBPS/toc.ncx
1072 03-23-12 13:28 iTunesMetadata.plist
-------- -------
1529151 32 files
shows that there is a content length for all the chapters.. but also, if we unzip the same file and rezip it again with stronger compression.. the zipFile java command returns the proper content size
question
is this the zip library's fault or the original compression fault? how can we know?
follow up question
see How to access a zipEntry from a streamed zip file in memory
回答1:
ZIP stores meta data inside the archive in a few different places ("local file header", "central directory" and sometimes a "data descriptor"). Only the "local file header" is in front of the file's content - the "central directory" is at the very end of the archive. Only the "central directory" holds the full truth, it is perfectly valid to not specify any size in the "local file header".
See section 4.4.8/4.4.9 in https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT which talks about the size fields
If bit 3 of the general purpose bit flag is set, these fields are set to zero in the local header and the correct values are put in the data descriptor and in the central directory.
The "data descriptor" immediately follows the compressed content of the entry - and thus is not available before reading the actual content of the entry when reading from a non-seekable stream.
When using ZipArchiveInputStream
you obtain the ZipEntry
as soon as the "local file header" has been read (because the underlying stream may not be seekable), so the size information may be missing. ZipFile
uses RandomAccessFile
under the covers and can read the "central directory" - as does unzip
and friends - so they know more than ZipArchiveInputStream
.
来源:https://stackoverflow.com/questions/36788065/why-do-certain-zip-files-have-unknown-file-content