Powershell calling Robocopy to get top32 largest files total size not working

前端 未结 2 519
猫巷女王i
猫巷女王i 2021-01-29 01:45

The below script is a combination of DFS settings and also the Robocopy command to list the top 32 biggest files in the file servers.

I need to execute the below code aga

相关标签:
2条回答
  • 2021-01-29 02:32

    I think you mean it works when you run it in the RDP session but it's not working in the PowerShell code.

    $noEmpties = [StringSplitOptions]::RemoveEmptyEntries
    
    (( robocopy /L /E /ndl /njh /njs /np /nc /bytes C:\temp2 nocopy | 
    ForEach-Object{ [Int64]$_.Split(" `t", $noEmpties)[0] } | 
    Sort-object -Descending )[0..31] | 
    Measure-Object -Sum).Sum /1gb
    

    Above is a simplification of what you were doing. It should run a little faster with fewer pipes and Select-Object commands. You might also think about the /MT:x robocopy arguments. I've had mixed logging results with multi-threading in the past, however in testing this scenario it seems to work. Of course that's if performance is a concern.

    Note: I'm assuming performance is a concern else Get-ChildItem would be a lot easier to write.

    The $matches approach was working but it's complicated to read etc... I added /np & /nc to the robocopy command to make parsing a little easier too.

    Now of course it's only going to emit a number. The number is the sum of the largest 32 files.

    I'm also not sure you need the first ForEach, I think you can go directly to the Select-Object command...

    If you have problems beyond this I think you should see what's going on inside the expression when it is run as such. The different results are probably due to different conditions at run time, like for example $_ may be different. Try putting a break point in your code or using the editor and step through testing all the values and expressions as you move through. That may help identify the problem.


    Update:

    I don't have a DFS resource to test your exact scenario, but I fed a custom object to your original code and it did work.

    I used the same approach to tests a sugary version of my earlier approach:

    $noEmpties = [StringSplitOptions]::RemoveEmptyEntries
    
    $Props =
    @(
        @{ n = 'Server - IP'; e = { "$($_.ComputerName) [$((Resolve-DnsName -Name $_.ComputerName -Type A).IPAddress)]" } }
        @{ n = 'Staging Path Quota GB'; e = { ( $_.StagingPathQuotaInMB / 1000 ) } },
        @{ 
            n = 'Top 32 Largest Files Size'
            e = {
                ( (Robocopy /L /E /NDL /NJH /NJS /NP /NC /Bytes C:\temp2 nocopy | 
                ForEach-Object{ [Int64]$_.Split(" `t", $noEmpties)[0] } | 
                Sort-object)[-1..-32] | 
                Measure-Object -Sum).Sum /1gb 
                }
        }
        'GroupName'
        'ContentPath'
        'State'
    )
    
    $results = Get-DfsrMembership  | 
    Select-Object $Props |
    Sort-Object 'Top 32 Largest Files Size'
    

    This seemed to work. For my own study I prefabricated the expressions in an array before executing the main pipeline. That is just a code segregation approach. In a case like this improving readability will go a long way while debugging. Use your favorite segregation approach; it could just as easily be moved to a function and called from the expression.

    Note: Your original expression was working in my tests

    At one point I did get all 0's returned, and it was because I failed to assign $noEmpties to [StringSplitOptions]::RemoveEmptyEntries. Which further makes me think something unexpected is happening in the expression. I can't quite put my finger on it, but you can resort to debugging if it's still an issue. Or, if my samples have the same issue in your environment.


    Update:

    Appreciate that you accepted @Theo's fine answer, but there a re a few things I want to point out. While I'm still not sure why certain remote conditions were yielding zero's, all of my tests were local, so you could have used my expression with Theo's Invoke-Command approach. Reason I mention; my approach has a compound performance advantage.

    When run across a little ~5000 files, Theo's approach averaged 501 ms and mine averaged 465. An otherwise insignificant difference of 36 ms could compound quite a bit across the 3-4 million files you mentioned.

    That isn't the fastest approach I came up with check this out:

    $noEmpties = [StringSplitOptions]::RemoveEmptyEntries
    
    [Int64[]]$Sizes = 
    Robocopy /L /E /NDL /NJH /NJS /NP /NC /BYTES C:\temp2 nocopy | 
    ForEach-Object{ $_.Split(" `t", $noEmpties)[0] } 
    [Int64[]]::Sort($Sizes)
    (($Sizes[-1..-32] | Measure-Object -Sum).Sum) / 1gb
    

    This is really cool. By type constraining the array I forced all the values to be [Int64]. No need to convert them on the fly. I then used the static sort method on the [Int[]] array class which turned out to be faster than Sort-Object. I did find documentation confirming that too. I believe the array slicing approach is generally faster than Select-Object, but I found no advantage to replacing Measure-Object with any kind of manual sum loop.

    Note: I suspect the .Split() approach will help deal with your other question. Although there may also be a RegEx based approach.

    Now in either approach I was able to eek out still more performance by using .SubString() instead of the Split approach. This is a little tricky because some of the white-space characters are tabs and some are spaces.

    [Int64[]]$Sizes = Robocopy /L /E /NDL /NJH /NJS /NP /NC /Bytes C:\temp2 nocopy | ForEach-Object{ $_.Substring(0,14) } 
    [Int64[]]::Sort($Sizes)
    (($Sizes[-1..-32] | Measure-Object -Sum).Sum) / 1gb
    

    There were a couple of seemingly random cases where this seemed not to work, but overall it seemed reliable. If anything you may have to play with the string index referenced. The .split() approach is more reliable, but I wanted to add this example out if interest on the performance angle.

    One final thing; you actually can use Get-ChildItem

    ((Get-ChildItem \\?\C:\temp2 -File -Recurse | 
    ForEach-Object{ $_.Length } |
    Sort-Object)[-1..-32] |
    Measure-Object -Sum).Sum/1gb
    

    However this is considerably slower averaging around 1230 ms over the same set of approximately 5000 files. You can get additional information about the \\?\ prefix syntax around the web here & here are examples.

    0 讨论(0)
  • 2021-01-29 02:37

    I agree with Steven in thinking your regex could be simpler and that you probably don't need the first ForEach-Object in your code.

    I cannot try this myself, but perhaps this is a faster alternative for you:

    @{ n = 'Top 32 Largest Files Size'; e = { 
        (robocopy /L /E /NDL /NJH /NJS /NP /NODCOPY /BYTES $_.ContentPath 'NoDestination' | ForEach-Object {
            [int64]([regex]'(?i)New File\s*(\d+)').Match($_).Groups[1].Value 
        } | Sort-Object -Descending | Select-Object -First 32 | Measure-Object -Sum).Sum / 1GB }
    }
    

    This doesn't need the if to check each returned line from robocopy, because lines that do not match the regex will yield a value of 0


    Again, sorry I cannot test this myself, but perhaps it would be better to let the servers themselves do the heavy lifting of calculating the sizes. Especially because from your description, I understand that running the code RDP on each server individually works.

    In this case, you do need the first ForEach-Object loop.

    Please can you try

    $scriptBlock = {
        param ([string]$Path)
        (robocopy /L /E /NDL /NJH /NJS /NP /NODCOPY /BYTES $Path 'NoDestination' | ForEach-Object {
            [int64]([regex]'(?i)New File\s*(\d+)').Match($_).Groups[1].Value 
        } | Sort-Object -Descending | Select-Object -First 32 | Measure-Object -Sum).Sum / 1GB
    }
    
    $results = Get-DfsrMembership | ForEach-Object {
        Write-Host "Retrieving Top 32 Largest Files Size from server $($_.ComputerName).."
        # get the calculated size from the server
        # because of the large number of files, this may take some time..
        $size = Invoke-Command -ComputerName $_.ComputerName -ScriptBlock $scriptBlock -ArgumentList $_.ContentPath
        [PsCustomObject]@{
            'Server - IP'               = "$($_.ComputerName) [$((Resolve-DnsName -Name $_.ComputerName -Type A).IPAddress)]"
            'Staging Path Quota GB'     = ($_.StagingPathQuotaInMB / 1024)
            'Top 32 Largest Files Size' = $size
            'GroupName'                 = $_.GroupName
            'ContentPath'               = $_.ContentPath
            'State'                     = $_.State
        }
    }
    
    $results | Sort-Object 'Top 32 Largest Files Size'
    

    It is quite possible you need to add parameter -Credential to the Invoke-Command cmdlet

    0 讨论(0)
提交回复
热议问题