问题
I'm trying to find out what the performance of a large directory structure would be if deep directories were to be accessed on a shared, nfs filesystem. The structure would be excessively large, with 4 levels of nested directories, each level containing 1024 directories. (1024 at root, 1024 in a given subdirectory, and so on).
This filesystem would be on a network repository that users would be accessing for their personal information. The data would be replicated on multiple servers and load-balanced, but still, each machine would have a decent load at all times.
If the 4th level contained the information that the users were looking for, how bad would the performance be? If all were accessing different subdirectories? Could this be resolved by caching inode information, or no?
I've been searching on this for a while, but I'm primarily finding information on large files rather than large directory structures.
回答1:
I did that at my work once. Don't remember the exact numbers offhand, but I think it was 8 levels deep, 10 subdirectories in each level (user id 87654321 maps to directory 8/7/6/5/4/3/2/1/. Turned out that was not such a great idea, started running into problems with filesystem inode number limits, iirc (10^10 = 10000000000 directories, not good). Switched to more subdirectories per level and many less levels; problems went away. Your situation sounds more manageable, but still, check that your filesystem would support the kinds of file and directory counts that you're anticipating.
回答2:
The answer here is going to be highly dependent on your operating system, can you provide more information? I have found that file open times under Linux have been reasonable up to directory sizes in the small tens of thousands, but I have not tried any tests with directory structures as large as yours (you do know that 1024 to the fourth power is 1,099,511,627,776 right? And that that's something like 180 times the population of the earth, right?)
回答3:
Seems like you'd just want to write an test app to generate 1024 folders, iterated 8 levels down, with each folder containing some number (100 - 1000?) of files 1KB in size and then randomly find and access the files.
Track the access times over multiple passes and see if it's acceptable to your requirements.
来源:https://stackoverflow.com/questions/80470/performance-of-an-large-directory-structure-networked-application