问题
I have an archive which I do not want to extract but check for each of its contents whether it is a file or a directory.
os.path.isdir and os.path.isfile do not work because I am working on archive. The archive can be anyone of tar,bz2,zip or tar.gz(so I cannot use their specific libraries). Plus, the code should work on any platform like linux or windows. Can anybody help me how to do it?
回答1:
You've stated that you need to support "tar, bz2, zip or tar.gz". Python's tarfile
module will automatically handle gz and bz2 compressed tar files, so there is really only 2 types of archive that you need to support: tar and zip. (bz2 by itself is not an archive format, it's just compression).
You can determine whether a given file is a tar file with tarfile.is_tarfile()
. This will also work on tar files compressed with gzip or bzip2 compression. Within a tar file you can determine whether a file is a directory using TarInfo.isdir()
or a file with TarInfo.isfile()
.
Similarly you can determine whether a file is a zip file using zipfile.is_zipfile()
. With zipfile
there is no method to distinguish directories from normal file, but files that end with /
are directories.
So, given a file name, you can do this:
import zipfile
import tarfile
filename = 'test.tgz'
if tarfile.is_tarfile(filename):
f = tarfile.open(filename)
for info in f:
if info.isdir():
file_type = 'directory'
elif info.isfile():
file_type = 'file'
else:
file_type = 'unknown'
print('{} is a {}'.format(info.name, file_type))
elif zipfile.is_zipfile(filename):
f = zipfile.ZipFile(filename)
for name in f.namelist():
print('{} is a {}'.format(name, 'directory' if name.endswith('/') else 'file'))
else:
print('{} is not an accepted archive file'.format(filename))
When run on a tar file with this structure:
(py2)[mhawke@localhost tmp]$ tar tvfz /tmp/test.tgz drwxrwxr-x mhawke/mhawke 0 2016-02-29 12:38 x/ lrwxrwxrwx mhawke/mhawke 0 2016-02-29 12:38 x/4 -> 3 drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:14 x/3/ drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:14 x/3/4/ -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:14 x/3/4/zzz drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:13 x/2/ -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/2/aa drwxrwxr-x mhawke/mhawke 0 2016-02-28 21:13 x/1/ -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/1/abc -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/1/ab -rw-rw-r-- mhawke/mhawke 0 2016-02-28 21:13 x/1/a
The output is:
x is a directory x/4 is a unknown x/3 is a directory x/3/4 is a directory x/3/4/zzz is a file x/2 is a directory x/2/aa is a file x/1 is a directory x/1/abc is a file x/1/ab is a file x/1/a is a file
Notice that x/4
is "unknown" because it is a symbolic link.
There is no easy way, with zipfile
, to distinguish a symlink (or other file types) from a directory or normal file. The information is there in the ZipInfo.external_attr
attribute, but it's messy to get it back out:
import stat
linked_file = f.filelist[1]
is_symlink = stat.S_ISLNK(linked_file.external_attr >> 16L)
回答2:
You can use the string.endswith(string)
method to check whether it has the proper file-name extension:
filenames = ['code.tar.gz', 'code2.bz2', 'code3.zip']
fileexts = ['.tar.gz', '.bz2', '.zip']
def check_extension():
for name in filenames:
for ext in fileexts:
if name.endswith(ext):
print ('The file: ', name, ' has the extension: ', ext)
check_extension()
which outputs:
The file: code.tar.gz has the extension: .tar.gz
The file: code2.bz2 has the extension: .bz2
The file: code3.zip has the extension: .zip
You would have to create a list of the file extensions for each and every archive file-type you'd want to check against, and would need to load in the file-name into a list where you can easily execute the check, but I think this would be a fairly effective way to solve your issue.
回答3:
I got the answer. It is that we can use two commands: archive.getall_members() and archive.getfile_members().
We iterate over each of them and store the file/folder names in two arrays a1(contains file/folder names) and a2(contains file names only). If both the arrays contain that element, then it is a file otherwise it is a folder.
回答4:
You can use the python-magic module and parse it's output.
[root@jasonralph ~]# yum install python-pip
[root@jasonralph ~]# pip install python-magic
[root@jasonralph ~]# cat py_file_check.py
#!/usr/bin/python
import magic
print magic.from_file('jason_ralph_org_20160215.tar.gz')
[root@jasonralph ~]# file jason_ralph_org_20160215.tar.gz
jason_ralph_org_20160215.tar.gz: gzip compressed data, from Unix, last modified: Mon Feb 29 01:33:25 2016
> [root@jasonralph ~]# python py_file_check.py > gzip compressed data, from Unix, last modified: Mon Feb 29 01:33:25 2016
来源:https://stackoverflow.com/questions/35690072/how-to-check-if-it-is-a-file-or-folder-for-an-archive-in-python