Fast Linux file count for a large number of files

后端 未结 17 2531
名媛妹妹
名媛妹妹 2020-12-22 17:21

I\'m trying to figure out the best way to find the number of files in a particular directory when there are a very large number of files (more than 100,000).

When the

相关标签:
17条回答
  • 2020-12-22 17:59

    Surprisingly for me, a bare-bones find is very much comparable to ls -f

    > time ls -f my_dir | wc -l
    17626
    
    real    0m0.015s
    user    0m0.011s
    sys     0m0.009s
    

    versus

    > time find my_dir -maxdepth 1 | wc -l
    17625
    
    real    0m0.014s
    user    0m0.008s
    sys     0m0.010s
    

    Of course, the values on the third decimal place shift around a bit every time you execute any of these, so they're basically identical. Notice however that find returns one extra unit, because it counts the actual directory itself (and, as mentioned before, ls -f returns two extra units, since it also counts . and ..).

    0 讨论(0)
  • 2020-12-22 18:00

    Fast Linux file count

    The fastest Linux file count I know is

    locate -c -r '/home'
    

    There is no need to invoke grep! But as mentioned, you should have a fresh database (updated daily by a cron job, or manual by sudo updatedb).

    From man locate

    -c, --count
        Instead  of  writing  file  names on standard output, write the number of matching
        entries only.
    

    Additional, you should know that it also counts the directories as files!


    BTW: If you want an overview of your files and directories on your system type

    locate -S
    

    It outputs the number of directories, files, etc.

    0 讨论(0)
  • 2020-12-22 18:00

    ls spends more time sorting the files names. Use -f to disable the sorting, which will save some time:

    ls -f | wc -l
    

    Or you can use find:

    find . -type f | wc -l
    
    0 讨论(0)
  • 2020-12-22 18:02

    The first 10 directories with the highest number of files.

    dir=/ ; for i in $(ls -1 ${dir} | sort -n) ; { echo "$(find ${dir}${i} \
        -type f | wc -l) => $i,"; } | sort -nr | head -10
    
    0 讨论(0)
  • 2020-12-22 18:03

    Use find. For example:

    find . -name "*.ext" | wc -l
    
    0 讨论(0)
  • 2020-12-22 18:03

    The fastest way on Linux (the question is tagged as Linux), is to use a direct system call. Here's a little program that counts files (only, no directories) in a directory. You can count millions of files and it is around 2.5 times faster than "ls -f" and around 1.3-1.5 times faster than Christopher Schultz's answer.

    #define _GNU_SOURCE
    #include <dirent.h>
    #include <stdio.h>
    #include <fcntl.h>
    #include <stdlib.h>
    #include <sys/syscall.h>
    
    #define BUF_SIZE 4096
    
    struct linux_dirent {
        long d_ino;
        off_t d_off;
        unsigned short d_reclen;
        char d_name[];
    };
    
    int countDir(char *dir) {
    
        int fd, nread, bpos, numFiles = 0;
        char d_type, buf[BUF_SIZE];
        struct linux_dirent *dirEntry;
    
        fd = open(dir, O_RDONLY | O_DIRECTORY);
        if (fd == -1) {
            puts("open directory error");
            exit(3);
        }
        while (1) {
            nread = syscall(SYS_getdents, fd, buf, BUF_SIZE);
            if (nread == -1) {
                puts("getdents error");
                exit(1);
            }
            if (nread == 0) {
                break;
            }
    
            for (bpos = 0; bpos < nread;) {
                dirEntry = (struct linux_dirent *) (buf + bpos);
                d_type = *(buf + bpos + dirEntry->d_reclen - 1);
                if (d_type == DT_REG) {
                    // Increase counter
                    numFiles++;
                }
                bpos += dirEntry->d_reclen;
            }
        }
        close(fd);
    
        return numFiles;
    }
    
    int main(int argc, char **argv) {
    
        if (argc != 2) {
            puts("Pass directory as parameter");
            return 2;
        }
        printf("Number of files in %s: %d\n", argv[1], countDir(argv[1]));
        return 0;
    }
    

    PS: It is not recursive, but you could modify it to achieve that.

    0 讨论(0)
提交回复
热议问题