问题
I have a very large .tar.gz
file which I can't extract all together because of lack of space. I would like to extract half of its contents, process them, and then extract the remaining half.
The archive contains several subdirectories, which in turn contain files. When I extract a subdirectory, I need all its contents to be extracted with it.
What's the best way of doing this in bash? Does tar
already allow this?
回答1:
You can also extract one by one using
tar zxvf file.tar.gz PATH/to/file/inside_archive -C DESTINATION/dir
You can include a script around this:
1) Keep the PATH and DESTINATION same (yes you can use your own base directory for DESTINATION)
2) You can get the path for a file inside archive using
tar -ztvf file.tar.gz
3) You can use a for loop like for files in $(tar -ztvf file.tar.gz | awk '{print $NF}')
and define a break condition as per requirement.
I would have done something like:
#!/bin/bash
for files in $(tar -ztvf file.tar.gz| awk '{print $NF}')
do
subDir=$(dirname $files)
echo $subDir
tar -C ./My_localDir/${subDir} -zxvf file.tar.gz $files
done
$subDir
contains the name of the sub Directories
Add a break condition to above according to your requirement.
回答2:
You can for example extract only files which match some pattern:
tar -xvzf largefile.tar.gz --wildcards --no-anchored '*.html'
So, depending on the largefile.tar structure one can extract files with one pattern -> process them -> after that delete files -> extract files with another pattern, and so on.
回答3:
OK, so based on this answer, I can list all contents at the desired depth. In my case, the tar.gz file is structured as follows:
archive.tar.gz:
archive/
archive/a/
archive/a/file1
archive/a/file2
archive/a/file3
archive/b/
archive/b/file4
archive/b/file5
archive/c/
archive/c/file6
So I want to loop over subdirectories a, b, c
and, for instance extract the first two of them:
parent_folder='archive/'
max_num=2
counter=0
mkdir $parent_folder
for subdir in `tar --exclude="*/*/*" -tf archive.tar.gz`; do
if [ "$subdir" = "$parent_folder" ];
then
echo 'not this one'
continue
fi
if [ "$counter" -lt "$max_num" ];
then
tar zxvf archive.tar.gz $subdir -C ./${parentfolder}${subdir}
counter=$((counter + 1))
fi
done
Next, for the remaining files:
max_num=2
counter=0
mkdir $parent_folder
for subdir in `tar --exclude="*/*/*" -tf files.tar.gz`; do
if [ "$subdir" = "$parent_folder" ];
then
echo 'not this one'
continue
fi
if [ "$counter" -ge "$max_num" ];
then
tar zxvf files.tar.gz $subdir -C ./${parent_folder}$subdir
fi
counter=$((counter + 1))
done
来源:https://stackoverflow.com/questions/24057301/bash-extract-only-part-of-tar-gz-archive