问题
I have a directory with more then 60000 files. How to get only N of them without using a find | head -n
or ls | head -n
solutions, since find
and ls
to read this list of files takes too much time. Are there any configs for ls
and find
or are there any other programs, which can help to safe the time?
回答1:
For what it worth:
# Create 60000 files
sh$ for i in {0..100}; do
for j in {0..600}; do
touch $(printf "%05d" $(($i+$j*100)));
done;
done
On Linux Debian Wheezy x86_64 w/ext4 file system:
sh$ time bash -c 'ls | head -n 50000 | tail -10'
49990
49991
49992
49993
49994
49995
49996
49997
49998
49999
real 0m0.248s
user 0m0.212s
sys 0m0.024s
sh$ time bash -c 'ls -f | head -n 50000 | tail -10'
27235
02491
55530
44435
24255
47247
16033
45447
18434
35303
real 0m0.051s
user 0m0.016s
sys 0m0.028s
sh$ time bash -c 'find | head -n 50000 | tail -10'
./02491
./55530
./44435
./24255
./47247
./16033
./45447
./18434
./35303
./07658
real 0m0.051s
user 0m0.024s
sys 0m0.024s
sh$ time bash -c 'ls -f | sed -n 49990,50000p'
30950
27235
02491
55530
44435
24255
47247
16033
45447
18434
35303
real 0m0.046s
user 0m0.032s
sys 0m0.016s
Of course, the following two are faster, as they only take the first entries (and they interrupt the pair process with a broken pipe once the required "lines" have been read):
sh$ time bash -c 'ls -f | sed 1000q >/dev/null'
real 0m0.008s
user 0m0.004s
sys 0m0.000s
sh$ time bash -c 'ls -f | head -1000>/dev/null'
real 0m0.008s
user 0m0.000s
sys 0m0.004s
Interestingly enough (?) with sed
we spend our time in user space process, whereas with head
it is in sys. After several runs, the results are consistent...
回答2:
You could write your own simple utility in C.
#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
int main(int argc, char **argv) {
DIR *dir;
struct dirent *ent;
int i = 0, n = 0;
n = atoi(argv[2]);
dir = opendir(argv[1]);
while ((ent = readdir(dir)) != NULL) {
if (strcmp(ent->d_name, ".") == 0 ||
strcmp(ent->d_name, "..") == 0)
continue;
if (i++ >= n) break;
printf("%s\n", ent->d_name);
}
closedir(dir);
return 0;
}
This is just a quick and dirty first draft, but you get the idea.
回答3:
You can use sed
with q
:
find ... | sed 10q ## Prints 1st to 10th line.
That would make sed
exit after 10th line which probably could make find
end its function quicker.
Another way is to use awk
but sed
is still more efficient:
find ... | awk 'NR==11{exit}1'
Or
find ... | awk '1;NR==10{exit}'
回答4:
ls -f directory | sed -n 1,10p # print line 1-10
Option of ls
:
-f
: do not sort
来源:https://stackoverflow.com/questions/24723689/how-to-read-first-n-th-files-from-directory-pleaso-not-a-head-n-solution