Python: Extracting specific files with pattern from tar.gz without extracting the complete file

后端 未结 2 1522
梦如初夏
梦如初夏 2021-01-20 11:47

I want to extract all files with the pattern *_sl_H* from many tar.gz files, without extracting all files from the archives.

I found these lines, but it

相关标签:
2条回答
  • 2021-01-20 12:29

    You can extract all files matching your pattern from many tar as follows:

    1. Use glob to get you a list of all of the *.tar or *.gz files in a given folder.

    2. For each tar file, get a list of the files in each tar file using the getmembers() function.

    3. Use a regular expression (or a simple if "xxx" in test) to filter the required files.

    4. Pass this list of matching files to the members parameter in the extractall() function.

    5. Exception handling is added to catch badly encoded tar files.

    For example:

    import tarfile
    import glob
    import re
    
    reT = re.compile(r'.*?_sl_H.*?')
    
    for tar_filename in glob.glob(r'\my_source_folder\*.tar'):
        try:
            t = tarfile.open(tar_filename, 'r')
        except IOError as e:
            print(e)
        else:
            t.extractall('outdir', members=[m for m in t.getmembers() if reT.search(m.name)])
    
    0 讨论(0)
  • 2021-01-20 12:41

    Take a look at TarFile.getmembers() method which returns the members of the archive as a list. After you have this list, you can decide with a condition which file is going to be extracted.

    import tarfile
    import os
    
    os.mkdir('outdir')
    t = tarfile.open('example.tar', 'r')
    for member in t.getmembers():
        if "_sl_H" in member.name:
            t.extract(member, "outdir")
    
    print os.listdir('outdir')
    
    0 讨论(0)
提交回复
热议问题