File holes are the empty spaces in file, which, however, doesn\'t take up any disk space and contains null bytes. Therefore, the file size is larger than its actual size on
Aside from creating files with holes, since ~2 months ago (mid-January 2011), you can punch holes on existing files on Linux, using fallocate(2)
FALLOC_FL_PUNCH_HOLE
LWN article, git commit on Linus' tree, patch to Linux's manpages.
N
.There will be a hole at the start of the file (up to, and excluding, position N
). You can similarly create files with holes in the middle.
The following document has some sample C code (search for "Sparse files"): http://www.win.tue.nl/~aeb/linux/lk/lk-6.html
The problem is carefully discussed in section 3.6 of W.Richard Stevens famous book "Advanced Programming in the UNIX Environment" (APUE for short). The lseek funstion included in unistd.h is used here, which is designed to set an open file's offset explicitly. The prototype of the lseek function is as follows:
off_t lseek(int filedes, off_t offset, int whence);
Here, filedes is the file descriptor, offset is the value we are willing to set, and whence is a constant set in the header file, specifically SEEK_SET, meaning that the offset is set from the beginning of the file; SEEK_CUR, meaning that the offset is set to its current value plus the offset in the arguement list; SEEK_END, meaning that the file's offset is set the the size of the file plus the offset in the arguement list.
The example to create a file with holes in C under UNIX like OSs is as follows:
/*Creating a file with a hole of size 810*/
#include <fcntl.h>
/*Two strings to write to the file*/
char buf1[] = "abcde";
char buf2[] = "ABCDE";
int main()
{
int fd; /*file descriptor*/
if((fd = creat("file_with_hole", FILE_MODE)) < 0)
err_sys("creat error");
if(write(fd, buf1, 5) != 5)
err_sys("buf1 write error");
/*offset now 5*/
if(lseek(fd, 815, SEEK_SET) == -1)
err_sys("lseek error");
/*offset now 815*/
if(write(fd, buf2, 5) !=5)
err_sys("buf2 write error");
/*offset now 820*/
return 0;
}
In the code above, err_sys is the function to deal with fatal error related to a system call.
Use the dd
command with a seek
parameter.
dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
That creates for you a file with a nice hole from byte 8192 to byte 28671.
Here's an example, demonstrating that indeed the file has holes in it (the ls -s
command tells you how many disk blocks are being used by a file):
$ dd if=/dev/urandom bs=4096 count=2 of=fwh # fwh = file with holes
2+0 records in
2+0 records out
8192 bytes (8.2 kB) copied, 0.00195565 s, 4.2 MB/s
$ dd if=/dev/urandom seek=7 bs=4096 count=2 of=fwh
2+0 records in
2+0 records out
8192 bytes (8.2 kB) copied, 0.00152742 s, 5.4 MB/s
$ dd if=/dev/zero bs=4096 count=9 of=fwnh # fwnh = file with no holes
9+0 records in
9+0 records out
36864 bytes (37 kB) copied, 0.000510568 s, 72.2 MB/s
$ ls -ls fw*
16 -rw-rw-r-- 1 hopper hopper 36864 Mar 15 10:25 fwh
36 -rw-rw-r-- 1 hopper hopper 36864 Mar 15 10:29 fwnh
As you can see, the file with holes takes up fewer disk blocks, despite being the same size.
If you want a program that does it, here it is:
#include <unistd.h>
#include <sys/types.h>
#include <stdio.h>
#include <fcntl.h>
int main(int argc, const char *argv[])
{
char random_garbage[8192]; /* Don't even bother to initialize */
int fd = -1;
if (argc < 2) {
fprintf(stderr, "Usage: %s <filename>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC, 0666);
if (fd < 0) {
perror("Can't open file: ");
return 2;
}
write(fd, random_garbage, 8192);
lseek(fd, 5 * 4096, SEEK_CUR);
write(fd, random_garbage, 8192);
close(fd);
return 0;
}
The above should work on any Unix. Someone else replied with a nice alternative method that is very Linux specific. I highlight it here because it's a method distinct from the two I gave, and can be used to put holes in existing files.