How to read first n-th files from directory (pleaso NOT a “head -n solution”)?

本小妞迷上赌 提交于 2019-12-24 12:30:43

问题


I have a directory with more then 60000 files. How to get only N of them without using a find | head -n or ls | head -n solutions, since find and ls to read this list of files takes too much time. Are there any configs for ls and find or are there any other programs, which can help to safe the time?


回答1:


For what it worth:

# Create 60000 files
sh$ for i in {0..100}; do
    for j in {0..600}; do
        touch $(printf "%05d" $(($i+$j*100)));
    done;
done

On Linux Debian Wheezy x86_64 w/ext4 file system:

sh$ time bash -c 'ls | head -n 50000 | tail -10'
49990
49991
49992
49993
49994
49995
49996
49997
49998
49999

real    0m0.248s
user    0m0.212s
sys 0m0.024s


sh$ time bash -c 'ls -f | head -n 50000 | tail -10'
27235
02491
55530
44435
24255
47247
16033
45447
18434
35303

real    0m0.051s
user    0m0.016s
sys 0m0.028s


sh$ time bash -c 'find | head -n 50000 | tail -10'
./02491
./55530
./44435
./24255
./47247
./16033
./45447
./18434
./35303
./07658

real    0m0.051s
user    0m0.024s
sys 0m0.024s


sh$ time bash -c 'ls -f | sed -n 49990,50000p'
30950
27235
02491
55530
44435
24255
47247
16033
45447
18434
35303

real    0m0.046s
user    0m0.032s
sys 0m0.016s

Of course, the following two are faster, as they only take the first entries (and they interrupt the pair process with a broken pipe once the required "lines" have been read):

sh$ time bash -c 'ls -f | sed 1000q >/dev/null'

real    0m0.008s
user    0m0.004s
sys 0m0.000s


sh$ time bash -c 'ls -f | head -1000>/dev/null'

real    0m0.008s
user    0m0.000s
sys 0m0.004s

Interestingly enough (?) with sed we spend our time in user space process, whereas with head it is in sys. After several runs, the results are consistent...




回答2:


You could write your own simple utility in C.

#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>

int main(int argc, char **argv) {
  DIR *dir;
  struct dirent *ent;
  int i = 0, n = 0;
  n = atoi(argv[2]);
  dir = opendir(argv[1]);
  while ((ent = readdir(dir)) != NULL) {
    if (strcmp(ent->d_name, ".") == 0 ||
        strcmp(ent->d_name, "..") == 0)
      continue;
    if (i++ >= n) break;
    printf("%s\n", ent->d_name);
  }
  closedir(dir);
  return 0;
}

This is just a quick and dirty first draft, but you get the idea.




回答3:


You can use sed with q:

find ... | sed 10q  ## Prints 1st to 10th line.

That would make sed exit after 10th line which probably could make find end its function quicker.

Another way is to use awk but sed is still more efficient:

find ... | awk 'NR==11{exit}1'

Or

find ... | awk '1;NR==10{exit}'



回答4:


ls -f directory | sed -n 1,10p       # print line 1-10

Option of ls:

-f: do not sort



来源:https://stackoverflow.com/questions/24723689/how-to-read-first-n-th-files-from-directory-pleaso-not-a-head-n-solution

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!