I want to extract all files with the pattern *_sl_H*
from many tar.gz files, without extracting all files from the archives.
I found these lines, but it
You can extract all files matching your pattern from many tar as follows:
Use glob
to get you a list of all of the *.tar
or *.gz
files in a given folder.
For each tar file, get a list of the files in each tar file using the getmembers() function.
Use a regular expression (or a simple if "xxx" in
test) to filter the required files.
Pass this list of matching files to the members
parameter in the extractall() function.
Exception handling is added to catch badly encoded tar files.
For example:
import tarfile
import glob
import re
reT = re.compile(r'.*?_sl_H.*?')
for tar_filename in glob.glob(r'\my_source_folder\*.tar'):
try:
t = tarfile.open(tar_filename, 'r')
except IOError as e:
print(e)
else:
t.extractall('outdir', members=[m for m in t.getmembers() if reT.search(m.name)])
Take a look at TarFile.getmembers() method which returns the members of the archive as a list. After you have this list, you can decide with a condition which file is going to be extracted.
import tarfile
import os
os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
for member in t.getmembers():
if "_sl_H" in member.name:
t.extract(member, "outdir")
print os.listdir('outdir')