Context We have a homegrown filesystem-backed caching library. We currently have performance problems with one installation due to large number of entries (
Something to look at is how your disk subsystem is arranged. While disks are growing in size rapidly, they are not getting much faster (in access time) Is a different disk arrangement (using more disks) or using SSD drives an option. For example, an SSD has no moving parts and can touch 100K files in 10 seconds. Making the warmup unnecessary.
If you never need to stat or list the cache directory, and only ever stat and open files within it by full path, it should not really matter (at least not at the 100k files level) how many files are in the directory.
Many caching frameworks and filesystem-heavy storage engines will create subdirectories based on the first character in the filenames in such scenarios, so that if you are storing a file "abcdefgh.png" in your cache, it would go into "cache/a/b/cdefgh.png" instead of just "cache/abcdefgh.png". This assumes that the distributions of the first two letters of your file names are roughly uniform across the character space.
As you mentioned, since your primary task that involves listing or traversing the directories is in deleting outdated files, I would recommend that you create directories based on the date and/or time the file was cached, i.e. "cache/2010/12/04/22/abcdefgh.png" and, wherever you index the cache, be sure to index it by filename AND date (especially if it's in a database) so that you can quickly remove items by date from the index and remove the corresponding directory.
I also believed that spreading files across subdirectories will speed-up operations.
So I conducted the tests: I've generated files from AAAA to ZZZZ (26^4 files, it's about 450K) and placed them into one NTFS directory. I also placed the identical files to subdirectories from AA to ZZ (i.e. grouped files by first 2 letters of their names). Then I performed some tests - enumeration and random access. I rebooted the system after creation and between tests.
Flat structure exposed slightly better performance than subdirectories. I believe this is because the directories are cached and NTFS indexes directory contents, so lookup is fast.
Note, that full enumeration (in both cases) took about 3 minutes for 400K files. This is significant time, but subdirectories make it even worse.
Conclusion: on NTFS in particular it makes no sense to group files into subdirectories if access is possible to any of those files. If you have a cache, I would also test grouping the files by date or by domain, assuming that some files are accessed more frequently than others, and the OS doesn't need to keep all directories in memory. However, for your number of files (under 100K) this probably wouldn't provide significant benefits either. You need to measure such specific scenarios yourself, I think.
Update: I've reduced my test for random access to only access half of the files (from AA to OO). The assumption was that this will involve one flat directory and only half of subdirectories (giving a bonus to subdirectory case). Still flat directory performed better. So I assume that unless you have millions of files, keeping them in one flat directory on NTFS will be faster than grouping them into subdirectories.
How are you loading your cache? If you are using standard Java file system interaction, that's going to be your first bottleneck - Java is pretty bad at folder content iteration - and if you are doing checks against each file as you iterate (get the modified date, make sure the File isn't a directory, etc...) performance can take a big hit (these all involve round trips to native land). Moving to a solution based on native FindFirstFile may provide significant (like orders of magnitude) improvement. FindFirstFile returns all of the information about the file with each iteration step. Java File.listFiles() returns the list of paths. Then when you query for attributes or other meta - each call is a round trip to the file system. Horribly, horribly inefficient.
OK - that's out of the way. Next, raw iteration of a huge directory in NTFS isn't particularly slower than an n-ary tree approach (folders and subfolders, etc...). With FAT32, this was a very big deal - but NTFS handles this sort of thing pretty well. That said, splitting into sub-folders opens up some natural parallelization opportunities that are much harder to achieve with a single folder. If you can spawn 10 or 15 threads, each hitting separate folders, then you can effectively eliminate disk latency as a contributing factor.
I would probably suggest that you start with profiling (you knew that already, of course) - and see where the bulk of the load time is coming from. You might be surprised (for example, in one of our apps that does a lot of file list processing, I was shocked to find how much time we were getting hit for when checking isDirectory() - a simple change like doing date compare before directory/file determination made a 30% improvement in our iteration speeds).