What are the performance and reliability implications of watching too many files?

后端 未结 1 907
眼角桃花
眼角桃花 2021-01-26 12:22

In Facebook\'s Watchman application, somewhere in the docs it says this:

Most systems have a finite limit on the number of directories that can be watched

相关标签:
1条回答
  • 2021-01-26 12:39

    Sorry if those docs aren't as clear as you might like.

    First up, we built watchman specifically to accelerate tools that have to operate on extremely large trees, particularly this one, that has only continued to have gotten bigger and bigger since this was written:

    https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/

    Facebook's main source repository is enormous--many times larger than even the Linux kernel, which checked in at 17 million lines of code and 44,000 files in 2013

    I don't have any more recent public numbers on the repo size that I can share to hand at the moment, but the main point here is that this should work just fine for the very vast majority of applications.

    Now to the behavior of the system when limits are exceeded. The answer depends on which operating system you're using.

    There are 2 main system limits that impact this behavior; one of them is a direct limit on the number of watched items; when it is exceeded, you cannot watch anything else. When running on Linux, Watchman will treat this case as unrecoverable and flag itself as poisoned; when in this state, it is impossible for it to accurately report on file changes within the scope of the number of directories that are being watched until you raise the system limit, or give up on trying to watch that part of the filesystem.

    When running on OS X, Watchman can't tell if this limit is exceeded due to poor diagnostics in the fsevents API; the best we can tell if that we were unable to initiate a watch. Because fsevents doesn't tell us what is going on, and because this limit is not user configurable, we can't put the process into a poisoned state.

    The other system limit is on the number of items that the kernel has buffered up for the watchman process to consume. If that buffer is overflowed the kernel will start to drop change notifications. It will inform watchman that it did so and this will cause watchman to perform a (likely, given that the tree is presumably large) expensive tree recrawl to make sure that it can (re-)discover any changes that it might have missed due to the overflow.

    OS X has a similar limit and similar recovery behavior, but does not allow the user to raise the limit. I've yet to observe this happening on OS X in the wild so I'm assuming that whatever this system limit defaults to is a pretty reasonable default.

    As for practical limits for various file sizes, it really depends on your system; the filesystem, the storage device, the CPU power and the other applications you may be running on that system impact the rate at which changes can be applied to the filesystem and reported by the kernel, and the rate at which your system will be able to consume the events from the kernel.

    The rate at which you change those files is a big factor; if you have a very large and busy tree that changes frequently (>100's of engineers making multiple commits per day and rebasing frequently) then you have an increased risk of hitting the recrawl case.

    There's no one-size-fits-all answer for tuning the system limits; you'll need to try it out and bump up the limits if/when you hit them.

    0 讨论(0)
提交回复
热议问题