NTFS directory has 100K entries. How much performance boost if spread over 100 subdirectories?

前端 未结 4 1800
长发绾君心
长发绾君心 2021-01-19 00:10

Context We have a homegrown filesystem-backed caching library. We currently have performance problems with one installation due to large number of entries (

4条回答
  •  鱼传尺愫
    2021-01-19 01:03

    How are you loading your cache? If you are using standard Java file system interaction, that's going to be your first bottleneck - Java is pretty bad at folder content iteration - and if you are doing checks against each file as you iterate (get the modified date, make sure the File isn't a directory, etc...) performance can take a big hit (these all involve round trips to native land). Moving to a solution based on native FindFirstFile may provide significant (like orders of magnitude) improvement. FindFirstFile returns all of the information about the file with each iteration step. Java File.listFiles() returns the list of paths. Then when you query for attributes or other meta - each call is a round trip to the file system. Horribly, horribly inefficient.

    OK - that's out of the way. Next, raw iteration of a huge directory in NTFS isn't particularly slower than an n-ary tree approach (folders and subfolders, etc...). With FAT32, this was a very big deal - but NTFS handles this sort of thing pretty well. That said, splitting into sub-folders opens up some natural parallelization opportunities that are much harder to achieve with a single folder. If you can spawn 10 or 15 threads, each hitting separate folders, then you can effectively eliminate disk latency as a contributing factor.

    I would probably suggest that you start with profiling (you knew that already, of course) - and see where the bulk of the load time is coming from. You might be surprised (for example, in one of our apps that does a lot of file list processing, I was shocked to find how much time we were getting hit for when checking isDirectory() - a simple change like doing date compare before directory/file determination made a 30% improvement in our iteration speeds).

提交回复
热议问题