How to speed up Powershell Get-Childitem over UNC

前端 未结 4 2052
被撕碎了的回忆
被撕碎了的回忆 2020-11-28 10:33

DIR or GCI is slow in Powershell, but fast in CMD. Is there any way to speed this up?

In CMD.exe, after a sub-second delay, this responds a

相关标签:
4条回答
  • 2020-11-28 11:15

    Here's an interactive reader that parses cmd /c dir (which can handle unc paths), and will collect the 3 most important properties for most people: full path, size, timestamp

    usage would be something like $files_with_details = $faster_get_files.GetFileList($unc_compatible_folder)

    and there's a helper function to check combined size $faster_get_files.GetSize($files_with_details)

    $faster_get_files = New-Module -AsCustomObject -ScriptBlock {
        #$DebugPreference = 'Continue' #verbose, this will take figuratively forever
        #$DebugPreference = 'SilentlyContinue'
        $directory_filter = "Directory of (.+)"
        $file_filter = "(\d+/\d+/\d+)\s+(\d+:\d+ \w{2})\s+([\d,]+)\s+(.+)" # [1] is day, [2] is time (AM/PM), [3] is size,  [4] is filename
        $extension_filter = "(.+)[\.](\w{3,4})" # [1] is leaf, [2] is extension
        $directory = ""
        function GetFileList ($directory = $this.directory) {
            if ([System.IO.Directory]::Exists($directory)) {
                # Gather raw file list
                write-Information "Gathering files..."
                $files_raw = cmd /c dir $directory \*.* /s/a-d
    
                # Parse file list
                Write-Information "Parsing file list..."
                $files_with_details = foreach ($line in $files_raw) {
                    Write-Debug "starting line {$($line)}"
                    Switch -regex ($line) {
                        $this.directory_filter{
                            $directory = $matches[1]
                            break
                        }
                        $this.file_filter {
                            Write-Debug "parsing matches {$($matches.value -join ";")}"
                            $date     = $matches[1]
                            $time     = $matches[2] # am/pm style
                            $size     = $matches[3]
                            $filename = $matches[4]
    
                            # we do a second match here so as to not append a fake period to files without an extension, otherwise we could do a single match up above
                            Write-Debug "parsing extension from {$($filename)}"
                            if ($filename -match $this.extension_filter) {
                                $file_leaf = $matches[1]
                                $file_extension = $matches[2]
                            } else {
                                $file_leaf = $filename
                                $file_extension = ""
                            }
                            [pscustomobject][ordered]@{
                                "fullname"  = [string]"$($directory)\$($filename)"
                                "filename"  = [string]$filename
                                "folder"    = [string]$directory
                                "file_leaf" = [string]$file_leaf
                                "extension" = [string]$file_extension
                                "date"      = get-date "$($date) $($time)"
                                "size"      = [int]$size
                            }
                            break
                        } 
                    } # finish directory/file test
                } # finish all files
                return $files_with_details
            } #finish directory exists test
            else #directory doesn't exist {throw("Directory not found")}
        }
        function GetSize($files_with_details) {
            $combined_size = ($files_with_details|measure -Property size -sum).sum
            $pretty_size_gb = "$([math]::Round($combined_size / 1GB, 4)) GB"
            return $pretty_size_gb
        }
        Export-ModuleMember -Function * -Variable *
    }
    
    0 讨论(0)
  • 2020-11-28 11:19

    Here is a good explanation on why Get-ChildItem is slow by Lee Holmes. If you take note of the comment from "Anon 11 Mar 2010 11:11 AM" at the bottom of the page his solution might work for you.

    Anon's Code:

    # SCOPE: SEARCH A DIRECTORY FOR FILES (W/WILDCARDS IF NECESSARY)
    # Usage:
    # $directory = "\\SERVER\SHARE"
    # $searchterms = "filname[*].ext"
    # PS> $Results = Search $directory $searchterms
    
    [reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic") | Out-Null
    
    Function Search {
      # Parameters $Path and $SearchString
      param ([Parameter(Mandatory=$true, ValueFromPipeline = $true)][string]$Path,
      [Parameter(Mandatory=$true)][string]$SearchString
      )
      try {
        #.NET FindInFiles Method to Look for file
        # BENEFITS : Possibly running as background job (haven't looked into it yet)
    
        [Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles(
        $Path,
        [Microsoft.VisualBasic.FileIO.SearchOption]::SearchAllSubDirectories,
        $SearchString
        )
      } catch { $_ }
    
    }
    
    0 讨论(0)
  • 2020-11-28 11:23

    Okay, this is how I'm doing it, and it seems to work.

    $files = cmd /c "$GETFILESBAT \\$server\logs\$filemask"
    foreach( $f in $files ) {
        if( $f.length -gt 0 ) {
            select-string -Path $f -pattern $regex | foreach-object { $_ }
        }
    }
    

    Then $GETFILESBAT points to this:

    @dir /a-d /b /s %1
    @exit
    

    I'm writing and deleting this BAT file from the PowerShell script, so I guess it's a PowerShell-only solution, but it doesn't use only PowerShell.

    My preliminary performance metrics show this to be eleventy-thousand times faster.

    I tested gci vs. cmd dir vs. FileIO.FileSystem.GetFiles from @Shawn Melton's referenced link.

    The bottom line is that, for daily use on local drives, GetFiles is the fastest. By far. CMD DIR is respectable. Once you introduce a slower network connection with many files, CMD DIR is slightly faster than GetFiles. Then Get-ChildItem... wow, this ranges from not too bad to horrible, depending on the number of files involved and the speed of the connection.

    Some test runs. I've moved GCI around in the tests to make sure the results were consistent.

    10 iterations of scanning c:\windows\temp for *.tmp files

    .\test.ps1 "c:\windows\temp" "*.tmp" 10
    GetFiles ... 00:00:00.0570057
    CMD dir  ... 00:00:00.5360536
    GCI      ... 00:00:01.1391139
    

    GetFiles is 10x faster than CMD dir, which itself is more than 2x faster than GCI.

    10 iterations of scanning c:\windows\temp for *.tmp files with recursion

    .\test.ps1 "c:\windows\temp" "*.tmp" 10 -recurse
    GetFiles ... 00:00:00.7020180
    CMD dir  ... 00:00:00.7644196
    GCI      ... 00:00:04.7737224
    

    GetFiles is a little faster than CMD dir, and both are almost 7x faster than GCI.

    10 iterations of scanning an on-site server on another domain for application log files

    .\test.ps1 "\\closeserver\logs\subdir" "appname*.*" 10
    GetFiles ... 00:00:00.3590359
    CMD dir  ... 00:00:00.6270627
    GCI      ... 00:00:06.0796079
    

    GetFiles is about 2x faster than CMD dir, itself 10x faster than GCI.

    One iteration of scanning a distant server on another domain for application log files, with many files involved

    .\test.ps1 "\\distantserver.company.com\logs\subdir" "appname.2011082*.*"
    CMD dir  ... 00:00:00.3340334
    GetFiles ... 00:00:00.4360436
    GCI      ... 00:11:09.5525579
    

    CMD dir is fastest going to the distant server with many files, but GetFiles is respectably close. GCI on the other hand is a couple of thousand times slower.

    Two iterations of scanning a distant server on another domain for application log files, with many files

    .\test.ps1 "\\distantserver.company.com\logs\subdir" "appname.20110822*.*" 2
    CMD dir  ... 00:00:00.9360240
    GetFiles ... 00:00:01.4976384
    GCI      ... 00:22:17.3068616
    

    More or less linear increase as test iterations increase.

    One iteration of scanning a distant server on another domain for application log files, with fewer files

    .\test.ps1 "\\distantserver.company.com\logs\othersubdir" "appname.2011082*.*" 10
    GetFiles ... 00:00:00.5304170
    CMD dir  ... 00:00:00.6240200
    GCI      ... 00:00:01.9656630
    

    Here GCI is not too bad, GetFiles is 3x faster, and CMD dir is close behind.

    Conclusion

    GCI needs a -raw or -fast option that does not try to do so much. In the meantime, GetFiles is a healthy alternative that is only occasionally a little slower than CMD dir, and usually faster (due to spawning CMD.exe?).

    For reference, here's the test.ps1 code.

    param ( [string]$path, [string]$filemask, [switch]$recurse=$false, [int]$n=1 )
    [reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic") | Out-Null
    write-host "GetFiles... " -nonewline
    $dt = get-date;
    for($i=0;$i -lt $n;$i++){
      if( $recurse ){ [Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles( $path,
          [Microsoft.VisualBasic.FileIO.SearchOption]::SearchAllSubDirectories,$filemask
        )  | out-file ".\testfiles1.txt"}
      else{ [Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles( $path,
          [Microsoft.VisualBasic.FileIO.SearchOption]::SearchTopLevelOnly,$filemask
        )  | out-file ".\testfiles1.txt" }}
    $dt2=get-date;
    write-host $dt2.subtract($dt)
    write-host "CMD dir... " -nonewline
    $dt = get-date;
    for($i=0;$i -lt $n;$i++){
      if($recurse){
        cmd /c "dir /a-d /b /s $path\$filemask" | out-file ".\testfiles2.txt"}
      else{ cmd /c "dir /a-d /b $path\$filemask" | out-file ".\testfiles2.txt"}}
    $dt2=get-date;
    write-host $dt2.subtract($dt)
    write-host "GCI... " -nonewline
    $dt = get-date;
    for($i=0;$i -lt $n;$i++){
      if( $recurse ) {
        get-childitem "$path\*" -include $filemask -recurse | out-file ".\testfiles0.txt"}
      else {get-childitem "$path\*" -include $filemask | out-file ".\testfiles0.txt"}}
    $dt2=get-date;
    write-host $dt2.subtract($dt)
    
    0 讨论(0)
  • 2020-11-28 11:33

    I tried some of the suggested methods with a large amount of files (~190.000). As mentioned in Kyle's comment, GetFiles isn't very useful here, because it needs nearly forever.

    cmd dir was better than Get-ChildItems at my first tests, but it seems, GCI speeds up a lot if you use the -Force parameter. With this the needed time was about the same as for cmd dir.

    P.S.: In my case I had to exclude most of the files because of their extension. This was made with -Exclude in gci and with a | where in the other commands. So the results for just searching files might slightly differ.

    0 讨论(0)
提交回复
热议问题