How to compare two tarball's content

后端 未结 11 819
北海茫月
北海茫月 2021-01-31 15:18

I want to tell whether two tarball files contain identical files, in terms of file name and file content, not including meta-data like date, user, group.

However, There

相关标签:
11条回答
  • 2021-01-31 16:11

    There is also diffoscope, which is more generic, and allows to compare things recursively (including various formats).

    pip install diffoscope
    
    0 讨论(0)
  • 2021-01-31 16:12

    I propose gtarsum, that I have written in Go, which means it will be an autonomous executable (no Python or other execution environment needed).

    go get github.com/VonC/gtarsum
    

    It will read a tar file, and:

    • sort the list of files alphabetically,
    • compute a SHA256 for each file content,
    • concatenate those hashes into one giant string
    • compute the SHA256 of that string

    The result is a "global hash" for a tar file, based on the list of files and their content.

    It can compare multiple tar files, and return 0 if they are identical, 1 if they are not.

    0 讨论(0)
  • 2021-01-31 16:13

    There is tool called archdiff. It is basically a perl script that can look into the archives.

    Takes two archives, or an archive and a directory and shows a summary of the
    differences between them.
    
    0 讨论(0)
  • 2021-01-31 16:17

    tarsum is almost what you need. Take its output, run it through sort to get the ordering identical on each, and then compare the two with diff. That should get you a basic implementation going, and it would be easily enough to pull those steps into the main program by modifying the Python code to do the whole job.

    0 讨论(0)
  • 2021-01-31 16:18

    Try also pkgdiff to visualize differences between packages (detects added/removed/renamed files and changed content, exist with zero code if unchanged):

    pkgdiff PKG-0.tgz PKG-1.tgz
    

    0 讨论(0)
提交回复
热议问题