Fastest way to find a full path of a given file via Powershell?

六月ゝ 毕业季﹏ 提交于 2020-12-06 05:03:42

问题


I need to write a Powershell snippet that finds the full path(s) for a given filename over a complete partition as fast as possible.

For the sake of better comparison, I am using this global variables for my code-samples:

$searchDir  = "c:\"
$searchName = "hosts"

I started with a small snippet using Get-ChildItem to have a first baseline:

"get-ChildItem"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$result = Get-ChildItem -LiteralPath $searchDir -Filter $searchName -File -Recurse -ea 0
write-host $timer.Elapsed.TotalSeconds "sec."

The runtime on my SSD was 14,8581609 sec.

Next, I tried running the classical DIR-command to see the improvements:

"dir"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$result = &cmd /c dir "$searchDir$searchName" /b /s /a-d
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."

This finished in 13,4713342 sec. - not bad, but can we get it faster?

In the third iteration I was testing the same task with ROBOCOPY. Here the code-sample:

"robocopy"
$timer = [System.Diagnostics.Stopwatch]::StartNew()

$roboDir = [System.IO.Path]::GetDirectoryName($searchDir)
if (!$roboDir) {$roboDir = $searchDir.Substring(0,2)}

$info = [System.Diagnostics.ProcessStartInfo]::new()
$info.FileName = "$env:windir\system32\robocopy.exe"
$info.RedirectStandardOutput = $true
$info.Arguments = " /l ""$roboDir"" null ""$searchName"" /bytes /njh /njs /np /nc /ndl /xjd /mt /s"
$info.UseShellExecute = $false
$info.CreateNoWindow = $true
$info.WorkingDirectory = $searchDir

$process = [System.Diagnostics.Process]::new()
$process.StartInfo = $info
[void]$process.Start()
$process.WaitForExit()

$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."

Or in a shorter version (based on the good comments):

"robocopy v2"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileList = (&cmd /c pushd $searchDir `& robocopy /l "$searchDir" null "$searchName" /ns /njh /njs /np /nc /ndl /xjd /mt /s).trim() -ne ''
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."

Was it faster than DIR? Yes, absolutely! The runtime is now down to 3,2685551 sec. Main reason for this huge improvement is the fact, that ROBOCOPY runs with the /mt-swich in multitask-mode in multiple parallel instances. But even without this turbo-switch is was faster than DIR.

Mission accomplished? Not really - because my task was, to create a powershell-script searching a file as fast as possible, but calling ROBOCOPY is a bit of cheating.

Next, I want to see, how fast we will be by using [System.IO.Directory]. First try was by using getFiles and getDirectory-calls. Here my code:

"GetFiles"
$timer = [System.Diagnostics.Stopwatch]::StartNew()

$fileList = [System.Collections.Generic.List[string]]::new()
$dirList = [System.Collections.Generic.Queue[string]]::new()
$dirList.Enqueue($searchDir)
while ($dirList.Count -ne 0) {
    $dir = $dirList.Dequeue()
    try {
        $files = [System.IO.Directory]::GetFiles($dir, $searchName)
        if ($files) {$fileList.addRange($file)}
        foreach($subdir in [System.IO.Directory]::GetDirectories($dir)) {
            $dirList.Enqueue($subDir)
        }
    } catch {}
}
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."

This time the runtime was 19,3393872 sec. By far the slowest code. Can we get it better? Here now a code-snippet with Enumeration-calls for comparison:

"EnumerateFiles"
$timer = [System.Diagnostics.Stopwatch]::StartNew()

$fileList = [System.Collections.Generic.List[string]]::new()
$dirList = [System.Collections.Generic.Queue[string]]::new()
$dirList.Enqueue($searchDir)
while ($dirList.Count -ne 0) {
    $dir = $dirList.Dequeue()
    try {
        foreach($file in [System.IO.Directory]::EnumerateFiles($dir, $searchName)) {
            $fileList.add($file)
        }
        foreach ($subdir in [System.IO.Directory]::EnumerateDirectories($dir)) {
            $dirList.Enqueue($subDir)
        }
    } catch {}
}

$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."

It was only slighly faster with a runtime of 19,2068545 sec.

Now let's see if we can get it faster with direct WinAPI-calls from Kernel32. Here the code. Let's see, how fast it is this time:

"WinAPI"
add-type -Name FileSearch -Namespace Win32 -MemberDefinition @"
    public struct WIN32_FIND_DATA {
        public uint dwFileAttributes;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
        public uint nFileSizeHigh;
        public uint nFileSizeLow;
        public uint dwReserved0;
        public uint dwReserved1;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
        public string cFileName;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
        public string cAlternateFileName;
    }

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern IntPtr FindFirstFile
      (string lpFileName, out WIN32_FIND_DATA lpFindFileData);

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern bool FindNextFile
      (IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern bool FindClose(IntPtr hFindFile);
"@

$rootDir = 'c:'
$searchFile = "hosts"

$fileList = [System.Collections.Generic.List[string]]::new()
$dirList = [System.Collections.Generic.Queue[string]]::new()
$dirList.Enqueue($rootDir)
$timer = [System.Diagnostics.Stopwatch]::StartNew()

$fileData = new-object Win32.FileSearch+WIN32_FIND_DATA
while ($dirList.Count -ne 0) {
    $dir = $dirList.Dequeue()
    $handle = [Win32.FileSearch]::FindFirstFile("$dir\*", [ref]$fileData)
    [void][Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)
    while ([Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)) {
        if ($fileData.dwFileAttributes -band 0x10) {
            $fullName = [string]::Join('\', $dir, $fileData.cFileName)
            $dirList.Enqueue($fullName)
        } elseif ($fileData.cFileName -eq $searchFile) {
            $fullName = [string]::Join('\', $dir, $fileData.cFileName)
            $fileList.Add($fullName)
        }
    }
    [void][Win32.FileSearch]::FindClose($handle)
}

$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."

For me, the result of this approach was quite a negative surprise. The runtime is 17,499286 sec. This is faster than the System.IO-calls but still slower than a simple Get-ChildItem.

But - there is still hope to come close to the super-fast result from ROBOCOPY! For Get-ChildItem we cannot make the call being executes in multi-tasking mode, but for e.g. the Kernel32-calls we have the option to make this a recursive function an call each iteration over all subfolders in a PARALLEL foreach-loop via embedded C#-code. But how to do that?

Does someone know how to change the last code-snippet to use parallel.foreach? Even if the result might not be that fast as ROBOCOPY I would like to post also this approach here to have a full storybook for this classic "file search" topic.

Please let me know, how to do the parallel code-part.

Update: For completeness I am adding the code and runtime of the GetFiles-code running on Powershell 7 with smarter access-handling:

"GetFiles PS7"
$timer = [System.Diagnostics.Stopwatch]::StartNew()
$fileList = [system.IO.Directory]::GetFiles(
  $searchDir, 
  $searchFile,
  [IO.EnumerationOptions] @{AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true}
)
$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."

The runtime on my system was 9,150673 sec. - faster than DIR, but still slower than robocopy with multi-tasking on 8 cores.

Update #2: After playing around with the new PS7-features I came up with this code-snippet which uses my first (but ugly?) parallel code-approach:

"WinAPI PS7 parallel"
$searchDir  = "c:\"
$searchFile = "hosts"

add-type -Name FileSearch -Namespace Win32 -MemberDefinition @"
    public struct WIN32_FIND_DATA {
        public uint dwFileAttributes;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
        public uint nFileSizeHigh;
        public uint nFileSizeLow;
        public uint dwReserved0;
        public uint dwReserved1;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
        public string cFileName;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
        public string cAlternateFileName;
    }

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern IntPtr FindFirstFile
      (string lpFileName, out WIN32_FIND_DATA lpFindFileData);

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern bool FindNextFile
      (IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern bool FindClose(IntPtr hFindFile);
"@

$rootDir = $searchDir -replace "\\$"
$maxRunSpaces = [int]$env:NUMBER_OF_PROCESSORS
$fileList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList.Add($rootDir)
$timer = [System.Diagnostics.Stopwatch]::StartNew()

(1..$maxRunSpaces) | ForEach-Object -ThrottleLimit $maxRunSpaces -Parallel {
    $dirList = $using:dirList
    $fileList = $using:fileList
    $fileData = new-object Win32.FileSearch+WIN32_FIND_DATA
    $dir = $null
    if ($_ -eq 1) {$delay = 0} else {$delay = 50}
    if ($dirList.TryTake([ref]$dir, $delay)) {
        do {
            $handle = [Win32.FileSearch]::FindFirstFile("$dir\*", [ref]$fileData)
            [void][Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)
            while ([Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)) {
                if ($fileData.dwFileAttributes -band 0x10) {
                    $fullName = [string]::Join('\', $dir, $fileData.cFileName)
                    $dirList.Add($fullName)
                } elseif ($fileData.cFileName -eq $using:searchFile) {
                    $fullName = [string]::Join('\', $dir, $fileData.cFileName)
                    $fileList.Add($fullName)
                }
            }
            [void][Win32.FileSearch]::FindClose($handle)
        } until (!$dirList.TryTake([ref]$dir))
    }
}

$timer.Stop()
write-host $timer.Elapsed.TotalSeconds "sec."

The runtime is now very close to the robocopy-timing. It is actually 4,0809719 sec.

Not bad, but I am still looking for a solution with a parallel.foreach-approach via embedded C# code to make it work also for Powershell v5.

Update #3: Here is now my final code for Powershell 5 running in parallel runspaces:

$searchDir  = "c:\"
$searchFile = "hosts"

"WinAPI parallel"
add-type -Name FileSearch -Namespace Win32 -MemberDefinition @"
    public struct WIN32_FIND_DATA {
        public uint dwFileAttributes;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
        public uint nFileSizeHigh;
        public uint nFileSizeLow;
        public uint dwReserved0;
        public uint dwReserved1;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
        public string cFileName;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
        public string cAlternateFileName;
    }

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern IntPtr FindFirstFile
      (string lpFileName, out WIN32_FIND_DATA lpFindFileData);

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern bool FindNextFile
      (IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern bool FindClose(IntPtr hFindFile);
"@

$rootDir = $searchDir -replace "\\$"
$maxRunSpaces = [int]$env:NUMBER_OF_PROCESSORS
$fileList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList = [System.Collections.Concurrent.BlockingCollection[string]]::new()
$dirList.Add($rootDir)
$timer = [System.Diagnostics.Stopwatch]::StartNew()

$runSpaceList = [System.Collections.Generic.List[PSObject]]::new()
$pool = [RunSpaceFactory]::CreateRunspacePool(1, $maxRunSpaces)
$pool.Open()

foreach ($id in 1..$maxRunSpaces) { 
    $runSpace = [Powershell]::Create()
    $runSpace.RunspacePool = $pool
    [void]$runSpace.AddScript({
        Param (
            [string]$searchFile,
            [System.Collections.Concurrent.BlockingCollection[string]]$dirList,
            [System.Collections.Concurrent.BlockingCollection[string]]$fileList
        )
        $fileData = new-object Win32.FileSearch+WIN32_FIND_DATA
        $dir = $null
        if ($id -eq 1) {$delay = 0} else {$delay = 50}
        if ($dirList.TryTake([ref]$dir, $delay)) {
            do {
                $handle = [Win32.FileSearch]::FindFirstFile("$dir\*", [ref]$fileData)
                [void][Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)
                while ([Win32.FileSearch]::FindNextFile($handle, [ref]$fileData)) {
                    if ($fileData.dwFileAttributes -band 0x10) {
                        $fullName = [string]::Join('\', $dir, $fileData.cFileName)
                        $dirList.Add($fullName)
                    } elseif ($fileData.cFileName -like $searchFile) {
                        $fullName = [string]::Join('\', $dir, $fileData.cFileName)
                        $fileList.Add($fullName)
                    }
                }
                [void][Win32.FileSearch]::FindClose($handle)
            } until (!$dirList.TryTake([ref]$dir))
        }
    })
    [void]$runSpace.addArgument($searchFile)
    [void]$runSpace.addArgument($dirList)
    [void]$runSpace.addArgument($fileList)
    $status = $runSpace.BeginInvoke()
    $runSpaceList.Add([PSCustomObject]@{Name = $id; RunSpace = $runSpace; Status = $status})
}

while ($runSpaceList.Status.IsCompleted -notcontains $true) {sleep -Milliseconds 10}
$pool.Close() 
$pool.Dispose()

$timer.Stop()
$fileList
write-host $timer.Elapsed.TotalSeconds "sec."

The overall runtime with 4,8586134 sec. is a bit slower than the PS7-version, but still much faster than any DIR or Get-ChildItem variation. ;-)

Final Solution: Finally I was able to answer my own question. Here is the final code:

"WinAPI parallel.foreach"

add-type -TypeDefinition @"
using System;
using System.IO;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Concurrent;
using System.Runtime.InteropServices;
using System.Threading;
using System.Threading.Tasks;
using System.Text.RegularExpressions;

public class FileSearch {
    public struct WIN32_FIND_DATA {
        public uint dwFileAttributes;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
        public uint nFileSizeHigh;
        public uint nFileSizeLow;
        public uint dwReserved0;
        public uint dwReserved1;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
        public string cFileName;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
        public string cAlternateFileName;
    }

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern IntPtr FindFirstFile
      (string lpFileName, out WIN32_FIND_DATA lpFindFileData);

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern bool FindNextFile
      (IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    public static extern bool FindClose(IntPtr hFindFile);

    static IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);

    public static class Globals {
        public static BlockingCollection<string> resultFileList {get;set;}
    }

    public static BlockingCollection<string> GetTreeFiles(string path, string searchFile) {
        Globals.resultFileList = new BlockingCollection<string>();
        List<string> dirList = new List<string>();
        searchFile = @"^" + searchFile.Replace(@".",@"\.").Replace(@"*",@".*").Replace(@"?",@".") + @"$";
        GetFiles(path, searchFile);
        return Globals.resultFileList;
    }

    static void GetFiles(string path, string searchFile) {
        path = path.EndsWith(@"\") ? path : path + @"\";
        List<string> dirList = new List<string>();
        WIN32_FIND_DATA fileData;
        IntPtr handle = INVALID_HANDLE_VALUE;
        handle = FindFirstFile(path + @"*", out fileData);
        if (handle != INVALID_HANDLE_VALUE) {
            FindNextFile(handle, out fileData);
            while (FindNextFile(handle, out fileData)) {
                if ((fileData.dwFileAttributes & 0x10) > 0) {
                    string fullPath = path + fileData.cFileName;
                    dirList.Add(fullPath);
                } else {
                    if (Regex.IsMatch(fileData.cFileName, searchFile, RegexOptions.IgnoreCase)) {
                        string fullPath = path + fileData.cFileName;
                        Globals.resultFileList.TryAdd(fullPath);
                    }
                }
            }
            FindClose(handle);
            Parallel.ForEach(dirList, (dir) => {
                GetFiles(dir, searchFile);
            });
        }
    }
}
"@

[fileSearch]::GetTreeFiles($searchDir, 'hosts')

And the final runtime is now faster than robocopy with 3,2536388 sec. I also added an optimized version of that code in the solution.


回答1:


tl;dr:

This answer does not try to solve the parallel problem as asked, however:

  • A single, recursive [IO.Directory]::GetFiles() call may be fast enough, though note that if inaccessible directories are involved this is only an option in PowerShell [Core] v6.2+:
# PowerShell [Core] v6.2+
[IO.Directory]::GetFiles(
  $searchDir, 
  $searchFile,
  [IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true }
)
  • Pragmatically speaking (outside of, say, a coding exercise), calling robocopy is a perfectly legitimate approach - assuming you only need to run on Windows - which is as simple as (note that con is a dummy argument for the unused target-directory parameter):
(robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''

A few points up front:

  • but calling ROBOCOPY is a bit of cheating.

    • Arguably, using .NET APIs / WinAPI calls is just as much cheating as calling an external utility such as RoboCopy (e.g. robocopy.exe /l ...). After all, calling external programs is a core mandate of any shell, including PowerShell (and neither System.Diagnostics.Process nor its PowerShell wrapper, Start-Process, are required for that). That said, while not a problem in this case, you do lose the ability to pass and receive objects when you call an external program, and in-process operations are typically faster.
  • For timing execution of commands (measuring performance), PowerShell offers a high-level wrapper around System.Diagnostics.Stopwatch: the Measure-Command cmdlet.

  • Such performance measurements fluctuate, because PowerShell, as a dynamically resolved language, employs lot of caches that incur overhead when they're first filled, and you generally won't know when that happens - see this GitHub issue for background information.

  • Additionally, a long-running command that traverses the file system is subject to interference from other processes running at the same time, and whether file-system information has already been cached from a previous run makes a big difference.

  • The following comparison uses a higher-level wrapper around Measure-Object, the Time-Command function, which makes comparing the relative runtime performance of multiple commands easy.


The key to speeding up PowerShell code is to minimize the actual PowerShell code and offload as much of the work possible to .NET method calls / (compiled) external programs.

The following contrasts the performance of:

  • Get-ChildItem (just for contrast, we know that it is too slow)

  • robocopy.exe

  • A single, recursive call to System.IO.Directory.GetFiles(), which may be fast enough for your purposes, despite being single-threaded.

    • Note: The call below uses features only available in .NET Core 2.1+ and therefore works in PowerShell [Core] v6.2+ only. The .NET Framework version of this API doesn't allow ignoring inaccessible directories (due to lack of permission), which makes the enumeration fail if such directories are encountered.
$searchDir = 'C:\'                                                                          #'# dummy comment to fix syntax highlighting
$searchFile = 'hosts'

# Define the commands to compare as an array of script blocks.
$cmds = 
  { 
    [IO.Directory]::GetFiles(
      $searchDir, 
      $searchFile,
      [IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true }
    )
  },
  {
    (Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName
  },
  {
    (robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''
  } 

Write-Verbose -vb "Warming up the cache..."
# Run one of the commands up front to level the playing field
# with respect to cached filesystem information.
$null = & $cmds[-1]

# Run the commands and compare their timings.
Time-Command $cmds -Count 1 -OutputToHost -vb

On my 2-core Windows 10 VM running PowerShell Core 7.1.0-preview.7 I get the following results; the numbers vary based on a lot of factors (not just the number of files), but should provide a general sense of relative performance (column Factor).

Note that since the file-system cache is deliberately warmed up beforehand, the numbers for a given machine will be too optimistic compared to a run without cached information.

As you can see, the PowerShell [Core] [System.IO.Directory]::GetFiles() call actually outperformed the multi-threaded robocopy call in this case.

VERBOSE: Warming up the cache...
VERBOSE: Starting 1 run(s) of:
    [IO.Directory]::GetFiles(
      $searchDir,
      $searchFile,
      [IO.EnumerationOptions] @{ AttributesToSkip = 'ReparsePoint'; RecurseSubdirectories = $true; IgnoreInaccessible = $true }
    )
  ...
C:\Program Files\Git\etc\hosts
C:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hosts
C:\Windows\System32\drivers\etc\hosts
C:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hosts
VERBOSE: Starting 1 run(s) of:
    (Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName
  ...
C:\Program Files\Git\etc\hosts
C:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hosts
C:\Windows\System32\drivers\etc\hosts
C:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hosts
VERBOSE: Starting 1 run(s) of:
    (robocopy $searchDir con $searchFile /l /s /mt /njh /njs /ns /nc /ndl /np).Trim() -ne ''
  ...
C:\Program Files\Git\etc\hosts
C:\Windows\WinSxS\amd64_microsoft-windows-w..ucture-other-minwin_31bf3856ad364e35_10.0.18362.1_none_079d0d71e24a6112\hosts
C:\Windows\System32\drivers\etc\hosts
C:\Users\jdoe\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04onWindows_79rhkp1fndgsc\LocalState\rootfs\etc\hosts

VERBOSE: Overall time elapsed: 00:01:48.7731236
Factor Secs (1-run avg.) Command
------ ----------------- -------
1.00   22.500            [IO.Directory]::GetFiles(…
1.14   25.602            (robocopy /l $searchDir NUL $searchFile /s /mt /njh /njs /ns /nc /np).Trim() -ne ''
2.69   60.623            (Get-ChildItem -Literalpath $searchDir -File -Recurse -Filter $searchFile -ErrorAction Ignore -Force).FullName




回答2:


This is the final code I created. Runtime is now 2,8627695 sec. Limiting the prallelism to the number of logical cores gave a better performance than doing a Parallel.ForEach for all subdirectories.

Instead of returning only the filename, you can return the full FileInfo-Object per hit into the resulting BlockingCollection.

# powershell-sample to find all "hosts"-files on Partition "c:\"

cls
Remove-Variable * -ea 0
[System.GC]::Collect()
$ErrorActionPreference = "stop"

$searchDir  = "c:\"
$searchFile = "hosts"

add-type -TypeDefinition @"
using System;
using System.IO;
using System.Linq;
using System.Collections.Concurrent;
using System.Runtime.InteropServices;
using System.Threading.Tasks;
using System.Text.RegularExpressions;

public class FileSearch {
    public struct WIN32_FIND_DATA {
        public uint dwFileAttributes;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
        public System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
        public uint nFileSizeHigh;
        public uint nFileSizeLow;
        public uint dwReserved0;
        public uint dwReserved1;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
        public string cFileName;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
        public string cAlternateFileName;
    }

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    static extern IntPtr FindFirstFile
        (string lpFileName, out WIN32_FIND_DATA lpFindFileData);

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    static extern bool FindNextFile
        (IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData);

    [DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Ansi)]
    static extern bool FindClose(IntPtr hFindFile);

    static IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);
    static BlockingCollection<string> dirList {get;set;}
    static BlockingCollection<string> fileList {get;set;}

    public static BlockingCollection<string> GetFiles(string searchDir, string searchFile) {
        bool isPattern = false;
        if (searchFile.Contains(@"?") | searchFile.Contains(@"*")) {
            searchFile = @"^" + searchFile.Replace(@".",@"\.").Replace(@"*",@".*").Replace(@"?",@".") + @"$";
            isPattern = true;
        }
        fileList = new BlockingCollection<string>();
        dirList = new BlockingCollection<string>();
        dirList.Add(searchDir);
        int[] threads = Enumerable.Range(1,Environment.ProcessorCount).ToArray();
        Parallel.ForEach(threads, (id) => {
            string path;
            IntPtr handle = INVALID_HANDLE_VALUE;
            WIN32_FIND_DATA fileData;
            if (dirList.TryTake(out path, 100)) {
                do {
                    path = path.EndsWith(@"\") ? path : path + @"\";
                    handle = FindFirstFile(path + @"*", out fileData);
                    if (handle != INVALID_HANDLE_VALUE) {
                        FindNextFile(handle, out fileData);
                        while (FindNextFile(handle, out fileData)) {
                            if ((fileData.dwFileAttributes & 0x10) > 0) {
                                string fullPath = path + fileData.cFileName;
                                dirList.TryAdd(fullPath);
                            } else {
                                if (isPattern) {
                                    if (Regex.IsMatch(fileData.cFileName, searchFile, RegexOptions.IgnoreCase)) {
                                        string fullPath = path + fileData.cFileName;
                                        fileList.TryAdd(fullPath);
                                    }
                                } else {
                                    if (fileData.cFileName == searchFile) {
                                        string fullPath = path + fileData.cFileName;
                                        fileList.TryAdd(fullPath);
                                    }
                                }
                            }
                        }
                        FindClose(handle);
                    }
                } while (dirList.TryTake(out path));
            }
        });
        return fileList;
    }
}
"@

$fileList = [fileSearch]::GetFiles($searchDir, $searchFile)
$fileList


来源:https://stackoverflow.com/questions/63956318/fastest-way-to-find-a-full-path-of-a-given-file-via-powershell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!