Why do operating systems limit file descriptors?

问题

I ask this question after trying my best to research the best way to implement a message queue server. Why do operating systems put limits on the number of open file descriptors a process and the global system can have? My current server implementation uses zeromq, and opens a subscriber socket for each connected websocket client. Obviously that single process is only going to be able to handle clients to the limit of the fds. When I research the topic I find lots of info on how to raise system limits to levels as high as 64k fds but it never mentions how it affects system performance and why it is 1k and lower to start with? My current approach is to try and dispatch messaging to all clients using a coroutine in its own loop, and a map of all clients and their subscription channels. But I would just love to hear a solid answer about file descriptor limitations and how they affect applications that try to use them on a per client level with persistent connections?

回答1:

It may be because a file descriptor value is an index into a file descriptor table. Therefore, the number of possible file descriptors would determine the size of the table. Average users would not want half of their ram being used up by a file descriptor table that can handle millions of file descriptors that they will never need.

回答2:

For performance purposes, the open file table needs to be statically allocated, so its size needs to be fixed. File descriptors are just offsets into this table, so all the entries need to be contiguous. You can resize the table, but this requires halting all threads in the process and allocating a new block of memory for the file table, then copying all entries from the old table to the new one. It's not something you want to do dynamically, especially when the reason you're doing it is because the old table is full!

回答3:

There are certain operations which slow down when you have lots of potential file descriptors. One example is the operation "close all file descriptors except stdin, stdout, and stderr" -- the only portable* way to do this is to attempt to close every possible file descriptor except those three, which can become a slow operation if you could potentially have millions of file descriptors open.

*: If you're willing to be non-portable, you cna look in /proc/self/fd -- but that's besides the point.

This isn't a particularly good reason, but it is a reason. Another reason is simply to keep a buggy program (i.e, one that "leaks" file descriptors) from consuming too much system resources.

回答4:

On unix systems, the process creation fork() and fork()/exec() idiom requires iterating over all potential process file descriptors attempting to close each one, typically leaving leaving only a few file descriptors such as stdin, stdout, stderr untouched or redirected to somewhere else.

Since this is the unix api for launching a process, it has to be done anytime a new process is created, including executing each and every non built-in command invoked within shell scripts.

Other factors to consider are that while some software may use sysconf(OPEN_MAX) to dynamically determine the number of files that may be open by a process, a lot of software still uses the C library's default FD_SETSIZE, which is typically 1024 descriptors and as such can never have more than that many files open regardless of any administratively defined higher limit.

Unix has a legacy asynchronous I/O mechanism based on file descriptor sets which use bit offsets to represent files to wait on and files that are ready or in an exception condition. It doesn't scale well for thousands of files as these descriptor sets need to be setup and cleared each time around the runloop. Newer non standard apis have appeared on the major unix variants including kqueue() on *BSD and epoll() on Linux to address performance shortcomings when dealing with a large number of descriptors.

It is important to note that select()/poll() is still used by A LOT of software as for a long time it has been the POSIX api for asynchronous I/O. The modern POSIX asynchronous IO approach is now aio_* API but it is likely not competitve with kqueue() or epoll() API's. I haven't used aio in anger and it certainly wouldn't have the performance and semantics offered by native approaches in the way they can aggregate multiple events for higher performance. kqueue() on *BSD has really good edge triggered semantics for event notification allowing it to replace select()/poll() without forcing large structural changes to your application. Linux epoll() follows the lead of *BSD kqueue() and improves upon it which in turn followed lead of Sun/Solaris evports.

The upshot is that increasing the number of allowed open files across the system adds both time and space overhead for every process in the system even if they can't make use of those descriptors based on the api they are using. There are also aggregate system limits as well for the number of open files allowed. This older but interesting tuning summary for 100k-200k simultaneous connections using nginx on FreeBSD provides some insight into the overheads for maintaining open connections and another one covering a wider range of systems but "only" seeing 10K connections as the Mt Everest.

Probably the best reference for unix system programing is W. Richard Stevens Advanced Programming in the Unix Environment

来源：https://stackoverflow.com/questions/6798365/why-do-operating-systems-limit-file-descriptors

标签

websocket

operating-system

message-queue

zeromq

file-descriptor