I\'m trying to figure out the best way to find the number of files in a particular directory when there are a very large number of files (more than 100,000).
When the
Surprisingly for me, a bare-bones find is very much comparable to ls -f
> time ls -f my_dir | wc -l
17626
real 0m0.015s
user 0m0.011s
sys 0m0.009s
versus
> time find my_dir -maxdepth 1 | wc -l
17625
real 0m0.014s
user 0m0.008s
sys 0m0.010s
Of course, the values on the third decimal place shift around a bit every time you execute any of these, so they're basically identical. Notice however that find
returns one extra unit, because it counts the actual directory itself (and, as mentioned before, ls -f
returns two extra units, since it also counts . and ..).
The fastest Linux file count I know is
locate -c -r '/home'
There is no need to invoke grep! But as mentioned, you should have a fresh database (updated daily by a cron job, or manual by sudo updatedb
).
From man locate
-c, --count
Instead of writing file names on standard output, write the number of matching
entries only.
Additional, you should know that it also counts the directories as files!
BTW: If you want an overview of your files and directories on your system type
locate -S
It outputs the number of directories, files, etc.
ls
spends more time sorting the files names. Use -f
to disable the sorting, which will save some time:
ls -f | wc -l
Or you can use find
:
find . -type f | wc -l
The first 10 directories with the highest number of files.
dir=/ ; for i in $(ls -1 ${dir} | sort -n) ; { echo "$(find ${dir}${i} \
-type f | wc -l) => $i,"; } | sort -nr | head -10
Use find. For example:
find . -name "*.ext" | wc -l
The fastest way on Linux (the question is tagged as Linux), is to use a direct system call. Here's a little program that counts files (only, no directories) in a directory. You can count millions of files and it is around 2.5 times faster than "ls -f" and around 1.3-1.5 times faster than Christopher Schultz's answer.
#define _GNU_SOURCE
#include <dirent.h>
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <sys/syscall.h>
#define BUF_SIZE 4096
struct linux_dirent {
long d_ino;
off_t d_off;
unsigned short d_reclen;
char d_name[];
};
int countDir(char *dir) {
int fd, nread, bpos, numFiles = 0;
char d_type, buf[BUF_SIZE];
struct linux_dirent *dirEntry;
fd = open(dir, O_RDONLY | O_DIRECTORY);
if (fd == -1) {
puts("open directory error");
exit(3);
}
while (1) {
nread = syscall(SYS_getdents, fd, buf, BUF_SIZE);
if (nread == -1) {
puts("getdents error");
exit(1);
}
if (nread == 0) {
break;
}
for (bpos = 0; bpos < nread;) {
dirEntry = (struct linux_dirent *) (buf + bpos);
d_type = *(buf + bpos + dirEntry->d_reclen - 1);
if (d_type == DT_REG) {
// Increase counter
numFiles++;
}
bpos += dirEntry->d_reclen;
}
}
close(fd);
return numFiles;
}
int main(int argc, char **argv) {
if (argc != 2) {
puts("Pass directory as parameter");
return 2;
}
printf("Number of files in %s: %d\n", argv[1], countDir(argv[1]));
return 0;
}
PS: It is not recursive, but you could modify it to achieve that.