问题
I am trying to list the files in all the sub-directories of a root directory with the below approach. But its taking much time when the number of files are in millions. Is there any better approach of doing this.
I am using .NET 3.5 so can't use enumerator :-(
******************* Main *************
DirectoryInfo dir = new DirectoryInfo(path);
DirectoryInfo[] subDir = dir.GetDirectories();
foreach (DirectoryInfo di in subDir) //call for each sub directory
{
PopulateList(di.FullName, false);
}
*******************************************
static void PopulateList(string directory, bool IsRoot)
{
System.Diagnostics.ProcessStartInfo procStartInfo = new System.Diagnostics.ProcessStartInfo("cmd", "/c " + "dir /s/b \"" + directory + "\"");
procStartInfo.RedirectStandardOutput = true;
procStartInfo.UseShellExecute = false;
procStartInfo.CreateNoWindow = true;
System.Diagnostics.Process proc = new System.Diagnostics.Process();
proc.StartInfo = procStartInfo;
proc.Start();
string fileName = directory.Substring(directory.LastIndexOf('\\') + 1);
StreamWriter writer = new StreamWriter(fileName + ".lst");
while (proc.StandardOutput.EndOfStream != true)
{
writer.WriteLine(proc.StandardOutput.ReadLine());
writer.Flush();
}
writer.Close();
}
回答1:
Remove all Process-related stuff and try out Directory.GetDirectories () and Directory.GetFiles() methods:
public IEnumerable<string> GetAllFiles(string rootDirectory)
{
foreach(var directory in Directory.GetDirectories(
rootDirectory,
"*",
SearchOption.AllDirectories))
{
foreach(var file in Directory.GetFiles(directory))
{
yield return file;
}
}
}
From MSDN, SearchOption.AllDirectories:
Includes the current directory and all the subdirectories in a search operation. This option includes reparse points like mounted drives and symbolic links in the search.
回答2:
It will be definitely faster to use DirectoryInfo.GetFiles
in a loop for each directory instead of spawning tons of new processes to read thier output.
回答3:
With millions of files you're actually running into filesystem limitation (see this and search for "300,000"), so take this into account.
As for optimizations, I think you'd really want to iterate lazily, so you'll have to P/Invoke into FindFirstFile/FindNextFile.
回答4:
Check out already available Directory.GetFiles overload.
For example:
var paths = Directory.GetFiles(root, "*", SearchOption.AllDirectories);
And yes it will take a lot of time. But I don't think that you can increase its performance using only .Net classes.
回答5:
Assuming that your millions of files are spread across multiple sub-directories and you're using .NET 4.0, you could look at the parallel extensions.
Using a parallel foreach loop to process the list of sub-directories, could make things a lot faster.
The new parallel extensions are also a lot safer and easier to use than attempting multi-threading at a lower-level.
The one thing to look out for is making sure that you limit the number of concurrent processes to something sensible.
来源:https://stackoverflow.com/questions/7596747/c-sharp-how-to-list-the-files-in-a-sub-directory-fast-optimised-way