Archival filesystem or format

≯℡__Kan透↙ 提交于 2020-02-03 05:13:23

问题


I'm looking for a file type for storing archives of systems that have been decomissioned. At the moment, we primarily use tar.gz, but finding and extracting just a few files from a 200GB tar.gz archive is unwieldy, since tar.gz doesn't support any sort of random-access read provision. (And before you get the idea, mounting a tgz using FUSE doen't make it better.)

Here's what we've found so far -- I'd like to know what other options there are:

  • tar.gz -- poor random-access read
  • zip -- lacks support for some advanced filesystem features (e.g: hard links, xattrs)
  • squashfs -- takes an extremely long time to create a large archive (many hours) and poor userspace tools.

I'm trying to think of a simple way of creating a full-featured filesystem image into as small a space as possible -- ext2 in a cloop image, but it doesn't seem like a particularly user-friendly solution.

Presumably this problem has been solved before -- are there any options I've missed?


回答1:


virt-sparsify can be used to sparsify and (through qemu's qcow2 gzip support) compress almost any linux filesystem or disk image. The resulting images can be mounted in a VM, or on the host through guestmount.

There's a new ndbkit xz plugin that can be used for higher compression, which still keeps good random-access performance (as long as you ask xz/pixz to reset compression on block boundaries).




回答2:


Mksquashfs is a highly parallelised program, and makes use of all available cores to maximise performance. If you're seeing very large build times then you either have a lot of duplicate files, or the machine is running short of memory and thrashing.

To investigate performance, you can firstly

Use -no-duplicates option on Mkssquashfs i,e.

mksquashfs xxx xxx.sqsh -no-duplicates

Duplicate checking is a slow operation and it has to be done sequentially, and on file sets with a lot of duplicates this becomes a bottleneck on an otherwise parallelised program.

Check memory usage/free memory while Mksquashfs is running, if the system is trashing, very low performance will occur. Investigate the -read-queue, -write-queue and -fragment-queue options to control how much data Mksquashfs caches at run-time.

Tar and zip are not parallelised and use only one core, and so it is difficult to believe your complaint about Mksquashfs compression performance.

Also I have never seen any other reports that the userspace programs are "poor", Mksquashfs and Unsquashfs have an advanced set of options which allow very fine control over the compression process, and to allow users to select which files are compressed - and these options are considerably in advance of programs like tar.

Unless you can give concrete examples of why the tools are poor, I will put this down to the usual case of the workman blaming the tools, whereas the real problem is elsewhere.

As I said previously, your system is probably thrashing and hence performing badly. By default Mksquashfs uses all available cores, and a minimum of 600 Mbytes of RAM (rising to 2 GBytes or more on large filesystems). This is for performance as caching data in memory reduces disk I/O. This "out of the box" behaviour is good for typical users which have large amounts of memory, and an otherwise idle system. This is what the majority of users want, a Mksquashfs which "maxes out" the system to achieve as fast as possible filesystem creation.

It is not good for systems with low RAM, or for systems with active processes consuming a large amount of the available CPU, and/or memory. You will simply get resource contention as each process contends for the available CPU and RAM. This is not a fault of Mksquashfs, but of the user.

The Mksquashfs -processor option is there to limit the number of processors Mksquashfs uses, the -read-queue, -write-queue and -fragment-queue options are there to control how much RAM is used by Mksquashfs.




回答3:


ZFS has pretty decent compression capabilities, if memory serves. That said, I've never actually used it. :-)




回答4:


As this is Stack Overflow, I assume you are looking for library/code. I think you can check our SolFS virtual file system then. It doesn't support hardlinks, but alternate streams are supported (for xattr) and tags are supported (for unix attributes). Next, symlinks are supported you can convert hardlinks to symlinks when performing the archive.



来源:https://stackoverflow.com/questions/6147303/archival-filesystem-or-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!