问题
Packaging a folder on a SUSE Linux Enterprise Server 12 SP3 system using GNU tar 1.30 always gives different md5 checksums although the file contents do not change.
I run tar to package my folder that contains a simple text file:
tar cf package.tar folder
Nevertheless, although the content is exactly the same, the resulting tar always has a different md5 (or sha1) checksum:
$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
e6383218596fffe118758b46e0edad1d package.tar
$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
1c5aa972e5bfa2ec78e63a9b3116e027 package.tar
Because the linux file system seems to deliver files in a random order to tar, I tried using the --sort
option. But the resulting command doesn't change the checksum issue for me. Also tar's --mtime
option does not help here, since the creation dates are exactly the same.
I appreciate any help on this.
回答1:
The archives you provided contain pax extended headers. A quick glance at their structure reveals that they differ in these two fields:
- The process ID of the pax process (as part of a name for the extended header in the ustar header block, and consequently the checksum for this ustar header block).
- The atime (access time) in the extended header.
One of the workarounds you can use for reproducible archive creation is to enforce the old unix ustar format (rather than the pax/posix format):
tar --format=ustar -cf package.tar folder
The other choice is to manually set the extended name and delete the atime while preserving the pax format:
tar --format=pax --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime -cf package.tar folder
Now the md5sum
should be the same for both archives.
回答2:
The header for tar files contain several fields which will be potentially different each time you re-tar a set of files. For instance the last access time and modification time will likely be different each time.
According to this article it is possible with GNU tar to produce identical output for identical input by doing the following:
# requires GNU Tar 1.28+
$ tar --sort=name \
--mtime="2018-10-05 00:00Z" \
--owner=0 --group=0 --numeric-owner \
-cf product.tar build
来源:https://stackoverflow.com/questions/52668432/tar-package-has-different-checksum-for-exactly-the-same-content