Retrieving files from directory that contains large amount of files

♀尐吖头ヾ 提交于 2019-11-27 03:27:13
Haris Hasan

Have you tried EnumerateFiles method of DirectoryInfo class?

As MSDN Says

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of FileInfo objects before the whole collection is returned; when you use GetFiles, you must wait for the whole array of FileInfo objects to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

In .NET 4.0, Directory.EnumerateFiles(...) is IEnumerable<string> (rather than the string[] of Directory.GetFiles(...)), so it can stream entries rather than buffer them all; i.e.

foreach(var file in Directory.EnumerateFiles(path)) {
    // ...
}

you are hitting the limitation of Windows file system itself. When number of files in a directory grows to a large number (and 14M is way beyond that threshold), accessing the directory becomes incredibly slow. It doesn't really matter if you read one file at a time or 1000, it's just directory access.

One way to solve this is to create subdirectories and break apart your files into groups. If each directory has 1000-5000 (guessing but you can experiment with actual numbers), then you should get decent performance opening/creating/deleting files.

This is why if you look at applications like Doxygen, which creates a file for every class, they follow this scheme and put everything into 2 levels of subdirectories which use random names.

Use Win32 Api FindFile functions to do it without blocking the app.

You can also call Directory.GetFiles in a System.Threading.Task (TPL) to prevent your UI from freezing.

Enjoy.

    public List<string> LoadPathToAllFiles(string pathToFolder, int numberOfFilesToReturn)
    {
        var dirInfo = new DirectoryInfo(pathToFolder);
        var firstFiles = dirInfo.EnumerateFiles().Take(numberOfFilesToReturn).ToList();
        return firstFiles.Select(l => l.FullName).ToList();
    }

I hit this issue of accessing large files in a single directory a lot of the time. Sub-directories are a good option, but soon even they don't offer much help sometimes. What I now do is create an Index file - a text file with names of all the files in the directory (provided I am creating files in that directory). I then read the index file and then open then actual file from the directory for processing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!