The C routines opendir(), readdir() and closedir() provide a way for me to traverse a directory structure. However, each dirent structure returned by readdir() does not seem
Have you tried ftw()
aka File Tree Walk ?
Snippit from man 3 ftw
:
int ftw(const char *dir, int (*fn)(const char *file, const struct stat *sb, int flag), int nopenfd);
ftw() walks through the directory tree starting from the indicated directory dir. For each found entry in the tree, it calls fn() with the full pathname of the entry, a pointer to the stat(2) structure for the entry and an int flag
Probably overkill for your application, but here's a library designed to traverse a directory tree with hundreds of millions of files.
https://github.com/hpc/libcircle
You seem to be missing one basic point: directory traversal involves reading data from the disk. Even when/if that data is in the cache, you end up going through a fair amount of code to get it from the cache into your process. Paths are also generally pretty short -- any more than a couple hundred bytes is pretty unusual. Together these mean that you can pretty reasonably build up strings for all the paths you need without any real problem. The time spent building the strings is still pretty minor compared to the time to read data from the disk. That means you can normally ignore the time spent on string manipulation, and work exclusively at optimizing disk usage.
My own experience has been that for most directory traversal a breadth-first search is usually preferable -- as you're traversing the current directory, put the full paths to all sub-directories in something like a priority queue. When you're finished traversing the current directory, pull the first item from the queue and traverse it, continuing until the queue is empty. This generally improves cache locality, so it reduces the amount of time spent reading the disk. Depending on the system (disk speed vs. CPU speed, total memory available, etc.) it's nearly always at least as fast as a depth-first traversal, and can easily be up to twice as fast (or so).
The way to use opendir
/readdir
/closedir
is to make the function recursive! Have a look at the snippet here on Dreamincode.net.
Hope this helps.
EDIT Thanks R.Sahu, the linky has expired, however, found it via wayback archive and took the liberty to add it to gist. Please remember, to check the license accordingly and attribute the original author for the source! :)
Instead of opendir()
, you can use a combination of openat(), dirfd() and fdopendir() and construct a recursive function to walk a directory tree:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <dirent.h>
void
dir_recurse (DIR *parent, int level)
{
struct dirent *ent;
DIR *child;
int fd;
while ((ent = readdir(parent)) != NULL) {
if ((strcmp(ent->d_name, ".") == 0) ||
(strcmp(ent->d_name, "..") == 0)) {
continue;
}
if (ent->d_type == DT_DIR) {
printf("%*s%s/\n", level, "", ent->d_name);
fd = openat(dirfd(parent), ent->d_name, O_RDONLY | O_DIRECTORY);
if (fd != -1) {
child = fdopendir(fd);
dir_recurse(child, level + 1);
closedir(child);
} else {
perror("open");
}
} else {
printf("%*s%s\n", level, "", ent->d_name);
}
}
}
int
main (int argc, char *argv)
{
DIR *root;
root = opendir(".");
dir_recurse(root, 0);
closedir(root);
return 0;
}
Here readdir()
is still used to get the next directory entry. If the next entry is a directory, then we find the parent directory fd with dirfd()
and pass this, along with the child directory name to openat()
. The resulting fd refers to the child directory. This is passed to fdopendir()
which returns a DIR *
pointer for the child directory, which can then be passed to our dir_recurse()
where it again will be valid for use with readdir()
calls.
This program recurses over the whole directory tree rooted at .
. Entries are printed, indented by 1 space per directory level. Directories are printed with a trailing /
.
On ideone.