Copying a 1TB sparse file

时光怂恿深爱的人放手 提交于 2019-11-30 03:28:01

Short answer: Use bsdtar or GNU tar (version 1.29 or later) to create archives, and GNU tar (version 1.26 or later) to extract them on another box.

Long answer: There are some requirements for this to work.

First, Linux must be at least kernel 3.1 (Ubuntu 12.04 or later would do), so it supports SEEK_HOLE functionality.

Then, you need tar utility that can support this syscall. GNU tar supports it since version 1.29 (released on 2016/05/16, it should be present by default since Ubuntu 18.04), or bsdtar since version 3.0.4 (available since Ubuntu 12.04) - install it using sudo apt-get install bsdtar.

While bsdtar (which uses libarchive) is awesome, unfortunately, it is not very smart when it comes to untarring - it stupidly requires to have at least as much free space on target drive as untarred file size, without regard to holes. GNU tar will untar such sparse archives efficiently and will not check this condition.

This is log from Ubuntu 12.10 (Linux kernel 3.5):

$ dd if=/dev/zero of=1tb seek=1T bs=1 count=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.000143113 s, 7.0 kB/s

$ time bsdtar cvfz sparse.tar.gz 1tb 
a 1tb

real    0m0.362s
user    0m0.336s
sys 0m0.020s

# Or, use gnu tar if version is later than 1.29:
$ time tar cSvfz sparse-gnutar.tar.gz 1tb
1tb

real    0m0.005s
user    0m0.006s
sys 0m0.000s

$ ls -l
-rw-rw-r-- 1 autouser autouser 1099511627777 Nov  7 01:43 1tb
-rw-rw-r-- 1 autouser autouser           257 Nov  7 01:43 sparse.tar.gz
-rw-rw-r-- 1 autouser autouser           134 Nov  7 01:43 sparse-gnutar.tar.gz
$

Like I said above, unfortunately, untarring with bsdtar will not work unless you have 1TB free space. However, any version of GNU tar works just fine to untar such sparse.tar:

$ rm 1tb 
$ time tar -xvSf sparse.tar.gz 
1tb

real    0m0.031s
user    0m0.016s
sys 0m0.016s
$ ls -l
total 8
-rw-rw-r-- 1 autouser autouser 1099511627777 Nov  7 01:43 1tb
-rw-rw-r-- 1 autouser autouser           257 Nov  7 01:43 sparse.tar.gz
wallyk

From a related question, maybe rsync will work:

rsync --sparse sparse-1 sparse-1-copy

I realize this question is very old, but here's an update that may be helpful to others who find their way here the same way I did.

Thankfully, mvp's excellent answer is now obsolete. According to the GNU tar release notes, SEEK_HOLE/SEEK_DATA was added in v. 1.29, released 2016-05-16. (And with GNU tar v. 1.30 being standard in Debian stable now, it's safe to assume that tar version ≥ 1.29 is available almost everywhere.)

So the way to handle sparse files now is to archive them with whichever tar (GNU or BSD) is installed on your system, and same for extracting.

Additionally, for sparse files that actually contain some data, if it's worthwhile to use compression (ie the data is compressible enough to save substantial disk space, and the disk space savings are worth the likely-substantial time and CPU resources required to compress it):

  • tar -cSjf <archive>.tar.bz2 /path/to/sparse/file will both take advantage of tar's SEEK_HOLE functionality to efficiently archive the sparse file and use bzip2 to compress the actual data.
  • tar --use-compress-program=pbzip2 -cSf <archive>.tar.bz2 /path/to/sparse/file, as alluded to in marcin's comment, will do the same while also taking advantage of multiple cores for the compression task.

On my little home server with a quad-core Atom CPU, using pbzip2 vs bzip2 reduced the time by around 25 or 30%.

With or without compression, this will give you an archive that doesn't need any special sparse-file handling, takes up approximately the 'real' size of the original sparse file (or less if compressed), and can be easily moved around with cp, rsync (both of which can be used on the original sparse file without trashing the sparseness), or scp (which can't).

Additional Notes

  1. When extracting, tar will automatically detect an archive created with -S so there's no need to specify it.
  2. An archive created with pbzip2 is stored in chunks. This results in the archive being marginally bigger than if bzip2 is used, but also means that the extraction can be multithreaded, unlike an archive created with bzip2.
  3. pbzip2 and bzip2 will reliably extract each other's archives without error or corruption.

You're definitely looking for a compression tool such as tar, lzma, bzip2, zip or rar. According to this site, lzma is quite fast while still having quite a good compression ratio:

http://blog.terzza.com/linux-compression-comparison-gzip-vs-bzip2-vs-lzma-vs-zip-vs-compress/

You can also adjust the speed/quality ratio of the compression by setting the compression level to something low, experiment a bit to find a level that works best

http://linux.die.net/man/1/unlzma

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!