I want to tell whether two tarball files contain identical files, in terms of file name and file content, not including meta-data like date, user, group.
However, There
There is also diffoscope, which is more generic, and allows to compare things recursively (including various formats).
pip install diffoscope
I propose gtarsum, that I have written in Go, which means it will be an autonomous executable (no Python or other execution environment needed).
go get github.com/VonC/gtarsum
It will read a tar file, and:
The result is a "global hash" for a tar file, based on the list of files and their content.
It can compare multiple tar files, and return 0 if they are identical, 1 if they are not.
There is tool called archdiff. It is basically a perl script that can look into the archives.
Takes two archives, or an archive and a directory and shows a summary of the
differences between them.
tarsum is almost what you need. Take its output, run it through sort to get the ordering identical on each, and then compare the two with diff. That should get you a basic implementation going, and it would be easily enough to pull those steps into the main program by modifying the Python code to do the whole job.
Try also pkgdiff to visualize differences between packages (detects added/removed/renamed files and changed content, exist with zero code if unchanged):
pkgdiff PKG-0.tgz PKG-1.tgz