How to read first n-th files from directory (pleaso NOT a “head -n solution”)?

问题

I have a directory with more then 60000 files. How to get only N of them without using a find | head -n or ls | head -n solutions, since find and ls to read this list of files takes too much time. Are there any configs for ls and find or are there any other programs, which can help to safe the time?

回答1:

For what it worth:

# Create 60000 files
sh$ for i in {0..100}; do
    for j in {0..600}; do
        touch $(printf "%05d" $(($i+$j*100)));
    done;
done

On Linux Debian Wheezy x86_64 w/ext4 file system:

sh$ time bash -c 'ls | head -n 50000 | tail -10'
49990
49991
49992
49993
49994
49995
49996
49997
49998
49999

real    0m0.248s
user    0m0.212s
sys 0m0.024s

sh$ time bash -c 'ls -f | head -n 50000 | tail -10'
27235
02491
55530
44435
24255
47247
16033
45447
18434
35303

real    0m0.051s
user    0m0.016s
sys 0m0.028s

sh$ time bash -c 'find | head -n 50000 | tail -10'
./02491
./55530
./44435
./24255
./47247
./16033
./45447
./18434
./35303
./07658

real    0m0.051s
user    0m0.024s
sys 0m0.024s

sh$ time bash -c 'ls -f | sed -n 49990,50000p'
30950
27235
02491
55530
44435
24255
47247
16033
45447
18434
35303

real    0m0.046s
user    0m0.032s
sys 0m0.016s

Of course, the following two are faster, as they only take the first entries (and they interrupt the pair process with a broken pipe once the required "lines" have been read):

sh$ time bash -c 'ls -f | sed 1000q >/dev/null'

real    0m0.008s
user    0m0.004s
sys 0m0.000s

sh$ time bash -c 'ls -f | head -1000>/dev/null'

real    0m0.008s
user    0m0.000s
sys 0m0.004s

Interestingly enough (?) with sed we spend our time in user space process, whereas with head it is in sys. After several runs, the results are consistent...

回答2:

You could write your own simple utility in C.

#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>

int main(int argc, char **argv) {
  DIR *dir;
  struct dirent *ent;
  int i = 0, n = 0;
  n = atoi(argv[2]);
  dir = opendir(argv[1]);
  while ((ent = readdir(dir)) != NULL) {
    if (strcmp(ent->d_name, ".") == 0 ||
        strcmp(ent->d_name, "..") == 0)
      continue;
    if (i++ >= n) break;
    printf("%s\n", ent->d_name);
  }
  closedir(dir);
  return 0;
}

This is just a quick and dirty first draft, but you get the idea.

回答3:

You can use sed with q:

find ... | sed 10q  ## Prints 1st to 10th line.

That would make sed exit after 10th line which probably could make find end its function quicker.

Another way is to use awk but sed is still more efficient:

find ... | awk 'NR==11{exit}1'

find ... | awk '1;NR==10{exit}'

回答4:

ls -f directory | sed -n 1,10p       # print line 1-10

Option of ls:

-f: do not sort

来源：https://stackoverflow.com/questions/24723689/how-to-read-first-n-th-files-from-directory-pleaso-not-a-head-n-solution

标签

bash

find

head

tail