I was looking at an example in K&R 2 (8.6 Example - Listing Directories). It is a stripped down version of Linux command ls
or Windows\' dir
. T
In Version 7 UNIX, there was only one unix filesystem, and its directories had a simple on-disk format: array of struct direct
. Reading it and interpreting the result was trivial. A syscall would have been redundant.
In modern times there are many kinds of filesystems that can be mounted by Linux and other unix-like systems (ext4, ZFS, NTFS!), some of which have complex directory formats. You can't do anything sensible with the raw bytes of an arbitrary directory. So the kernel has taken on the responsibility of providing a generic interface to directories as abstract objects. readdir
is the central piece of that interface.
Some modern unices still allow read()
on a directory, because it's part of their history. Linux history began in the 90's, when it was already obvious that read()
on a directory was never going to be useful, so Linux has never allowed it.
Linux does provide a readdir
syscall, but it's not used very much anymore, because something better has come along: getdents
. readdir only returns one directory entry at a time, so if you use the readdir syscall in a loop to get a list of files in a directory, you enter the kernel on every loop iteration. getdents returns multiple entries into a buffer.
readdir
is, however, the standard interface, so glibc provides a readdir function that calls the getdents syscall instead of the readdir syscall. In an ordinary program you'll see readdir in the source code, but getdents in the strace. The C library is helping performance by buffering, just like it does in stdio for regular files when you call getchar()
and it does a read()
of a few kilobytes at a time instead of a bunch of single-byte read()
s.
You'll never use the original unbuffered readdir
syscall on a modern Linux system unless you run an executable that was compiled a long time ago, or go out of your way to bypass the C library.