bash: extract only part of tar.gz archive

有些话、适合烂在心里 提交于 2019-12-13 19:15:36

问题


I have a very large .tar.gz file which I can't extract all together because of lack of space. I would like to extract half of its contents, process them, and then extract the remaining half.

The archive contains several subdirectories, which in turn contain files. When I extract a subdirectory, I need all its contents to be extracted with it.

What's the best way of doing this in bash? Does tar already allow this?


回答1:


You can also extract one by one using

tar zxvf file.tar.gz PATH/to/file/inside_archive -C DESTINATION/dir

You can include a script around this:

1) Keep the PATH and DESTINATION same (yes you can use your own base directory for DESTINATION)

2) You can get the path for a file inside archive using

tar -ztvf file.tar.gz

3) You can use a for loop like for files in $(tar -ztvf file.tar.gz | awk '{print $NF}') and define a break condition as per requirement.

I would have done something like:

#!/bin/bash
for files in $(tar -ztvf file.tar.gz| awk '{print $NF}')
do 
subDir=$(dirname $files)
echo $subDir     
tar -C ./My_localDir/${subDir} -zxvf file.tar.gz $files 
done

$subDir contains the name of the sub Directories

Add a break condition to above according to your requirement.




回答2:


You can for example extract only files which match some pattern:

tar -xvzf largefile.tar.gz --wildcards --no-anchored '*.html'

So, depending on the largefile.tar structure one can extract files with one pattern -> process them -> after that delete files -> extract files with another pattern, and so on.




回答3:


OK, so based on this answer, I can list all contents at the desired depth. In my case, the tar.gz file is structured as follows:

archive.tar.gz:
archive/
archive/a/
archive/a/file1
archive/a/file2
archive/a/file3
archive/b/
archive/b/file4
archive/b/file5
archive/c/
archive/c/file6

So I want to loop over subdirectories a, b, c and, for instance extract the first two of them:

parent_folder='archive/'
max_num=2
counter=0
mkdir $parent_folder
for subdir in `tar --exclude="*/*/*" -tf archive.tar.gz`; do
    if [ "$subdir" = "$parent_folder" ];
    then
        echo 'not this one'
        continue        
    fi
    if [ "$counter" -lt "$max_num" ];
    then
        tar zxvf archive.tar.gz $subdir -C ./${parentfolder}${subdir}
        counter=$((counter + 1))
    fi
done

Next, for the remaining files:

max_num=2
counter=0
mkdir $parent_folder
for subdir in `tar --exclude="*/*/*" -tf files.tar.gz`; do
    if [ "$subdir" = "$parent_folder" ];
    then
        echo 'not this one'
        continue        
    fi
    if [ "$counter" -ge "$max_num" ];
    then
        tar zxvf files.tar.gz $subdir -C ./${parent_folder}$subdir
    fi
    counter=$((counter + 1))
done


来源:https://stackoverflow.com/questions/24057301/bash-extract-only-part-of-tar-gz-archive

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!