How to check if it is a file or folder for an archive in python?

耗尽温柔 提交于 2019-12-21 15:38:33

问题


I have an archive which I do not want to extract but check for each of its contents whether it is a file or a directory.

os.path.isdir and os.path.isfile do not work because I am working on archive. The archive can be anyone of tar,bz2,zip or tar.gz(so I cannot use their specific libraries). Plus, the code should work on any platform like linux or windows. Can anybody help me how to do it?


回答1:


You've stated that you need to support "tar, bz2, zip or tar.gz". Python's tarfile module will automatically handle gz and bz2 compressed tar files, so there is really only 2 types of archive that you need to support: tar and zip. (bz2 by itself is not an archive format, it's just compression).

You can determine whether a given file is a tar file with tarfile.is_tarfile(). This will also work on tar files compressed with gzip or bzip2 compression. Within a tar file you can determine whether a file is a directory using TarInfo.isdir() or a file with TarInfo.isfile().

Similarly you can determine whether a file is a zip file using zipfile.is_zipfile(). With zipfile there is no method to distinguish directories from normal file, but files that end with / are directories.

So, given a file name, you can do this:

import zipfile
import tarfile

filename = 'test.tgz'

if tarfile.is_tarfile(filename):
    f = tarfile.open(filename)
    for info in f:
        if info.isdir():
            file_type = 'directory'
        elif info.isfile():
            file_type = 'file'
        else:
            file_type = 'unknown'
        print('{} is a {}'.format(info.name, file_type))

elif zipfile.is_zipfile(filename):
    f = zipfile.ZipFile(filename)
    for name in f.namelist():
         print('{} is a {}'.format(name, 'directory' if name.endswith('/') else 'file'))

else:
    print('{} is not an accepted archive file'.format(filename))

When run on a tar file with this structure:

(py2)[mhawke@localhost tmp]$ tar tvfz /tmp/test.tgz
drwxrwxr-x mhawke/mhawke     0 2016-02-29 12:38 x/
lrwxrwxrwx mhawke/mhawke     0 2016-02-29 12:38 x/4 -> 3
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:14 x/3/
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:14 x/3/4/
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:14 x/3/4/zzz
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:13 x/2/
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/2/aa
drwxrwxr-x mhawke/mhawke     0 2016-02-28 21:13 x/1/
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/1/abc
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/1/ab
-rw-rw-r-- mhawke/mhawke     0 2016-02-28 21:13 x/1/a

The output is:

x is a directory
x/4 is a unknown
x/3 is a directory
x/3/4 is a directory
x/3/4/zzz is a file
x/2 is a directory
x/2/aa is a file
x/1 is a directory
x/1/abc is a file
x/1/ab is a file
x/1/a is a file

Notice that x/4 is "unknown" because it is a symbolic link.

There is no easy way, with zipfile, to distinguish a symlink (or other file types) from a directory or normal file. The information is there in the ZipInfo.external_attr attribute, but it's messy to get it back out:

import stat

linked_file = f.filelist[1]
is_symlink = stat.S_ISLNK(linked_file.external_attr >> 16L)



回答2:


You can use the string.endswith(string) method to check whether it has the proper file-name extension:

filenames = ['code.tar.gz', 'code2.bz2', 'code3.zip']
fileexts = ['.tar.gz', '.bz2', '.zip']

def check_extension():
    for name in filenames:
        for ext in fileexts:
            if name.endswith(ext):
                print ('The file: ', name, ' has the extension: ', ext)


check_extension()

which outputs:

The file:  code.tar.gz  has the extension:  .tar.gz
The file:  code2.bz2  has the extension:  .bz2
The file:  code3.zip  has the extension:  .zip

You would have to create a list of the file extensions for each and every archive file-type you'd want to check against, and would need to load in the file-name into a list where you can easily execute the check, but I think this would be a fairly effective way to solve your issue.




回答3:


I got the answer. It is that we can use two commands: archive.getall_members() and archive.getfile_members().

We iterate over each of them and store the file/folder names in two arrays a1(contains file/folder names) and a2(contains file names only). If both the arrays contain that element, then it is a file otherwise it is a folder.




回答4:


You can use the python-magic module and parse it's output.

[root@jasonralph ~]# yum install python-pip

[root@jasonralph ~]# pip install python-magic

[root@jasonralph ~]# cat py_file_check.py
#!/usr/bin/python

import magic
print magic.from_file('jason_ralph_org_20160215.tar.gz')

[root@jasonralph ~]# file jason_ralph_org_20160215.tar.gz
jason_ralph_org_20160215.tar.gz: gzip compressed data, from Unix, last   modified: Mon Feb 29 01:33:25 2016
> [root@jasonralph ~]# python py_file_check.py
>         gzip compressed data, from Unix, last modified: Mon Feb 29 01:33:25 2016


来源:https://stackoverflow.com/questions/35690072/how-to-check-if-it-is-a-file-or-folder-for-an-archive-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!