The below script is a combination of DFS settings and also the Robocopy command to list the top 32 biggest files in the file servers.
I need to execute the below code aga
I think you mean it works when you run it in the RDP session but it's not working in the PowerShell code.
$noEmpties = [StringSplitOptions]::RemoveEmptyEntries
(( robocopy /L /E /ndl /njh /njs /np /nc /bytes C:\temp2 nocopy |
ForEach-Object{ [Int64]$_.Split(" `t", $noEmpties)[0] } |
Sort-object -Descending )[0..31] |
Measure-Object -Sum).Sum /1gb
Above is a simplification of what you were doing. It should run a little faster with fewer pipes and Select-Object
commands. You might also think about the /MT:x
robocopy arguments. I've had mixed logging results with multi-threading in the past, however in testing this scenario it seems to work. Of course that's if performance is a concern.
Note: I'm assuming performance is a concern else Get-ChildItem
would be a lot easier to write.
The $matches
approach was working but it's complicated to read etc... I added /np
& /nc
to the robocopy command to make parsing a little easier too.
Now of course it's only going to emit a number. The number is the sum of the largest 32 files.
I'm also not sure you need the first ForEach
, I think you can go directly to the Select-Object
command...
If you have problems beyond this I think you should see what's going on inside the expression when it is run as such. The different results are probably due to different conditions at run time, like for example $_
may be different. Try putting a break point in your code or using the editor and step through testing all the values and expressions as you move through. That may help identify the problem.
Update:
I don't have a DFS resource to test your exact scenario, but I fed a custom object to your original code and it did work.
I used the same approach to tests a sugary version of my earlier approach:
$noEmpties = [StringSplitOptions]::RemoveEmptyEntries
$Props =
@(
@{ n = 'Server - IP'; e = { "$($_.ComputerName) [$((Resolve-DnsName -Name $_.ComputerName -Type A).IPAddress)]" } }
@{ n = 'Staging Path Quota GB'; e = { ( $_.StagingPathQuotaInMB / 1000 ) } },
@{
n = 'Top 32 Largest Files Size'
e = {
( (Robocopy /L /E /NDL /NJH /NJS /NP /NC /Bytes C:\temp2 nocopy |
ForEach-Object{ [Int64]$_.Split(" `t", $noEmpties)[0] } |
Sort-object)[-1..-32] |
Measure-Object -Sum).Sum /1gb
}
}
'GroupName'
'ContentPath'
'State'
)
$results = Get-DfsrMembership |
Select-Object $Props |
Sort-Object 'Top 32 Largest Files Size'
This seemed to work. For my own study I prefabricated the expressions in an array before executing the main pipeline. That is just a code segregation approach. In a case like this improving readability will go a long way while debugging. Use your favorite segregation approach; it could just as easily be moved to a function and called from the expression.
Note: Your original expression was working in my tests
At one point I did get all 0's returned, and it was because I failed to assign $noEmpties
to [StringSplitOptions]::RemoveEmptyEntries
. Which further makes me think something unexpected is happening in the expression. I can't quite put my finger on it, but you can resort to debugging if it's still an issue. Or, if my samples have the same issue in your environment.
Update:
Appreciate that you accepted @Theo's fine answer, but there a re a few things I want to point out. While I'm still not sure why certain remote conditions were yielding zero's, all of my tests were local, so you could have used my expression with Theo's Invoke-Command
approach. Reason I mention; my approach has a compound performance advantage.
When run across a little ~5000 files, Theo's approach averaged 501 ms and mine averaged 465. An otherwise insignificant difference of 36 ms could compound quite a bit across the 3-4 million files you mentioned.
That isn't the fastest approach I came up with check this out:
$noEmpties = [StringSplitOptions]::RemoveEmptyEntries
[Int64[]]$Sizes =
Robocopy /L /E /NDL /NJH /NJS /NP /NC /BYTES C:\temp2 nocopy |
ForEach-Object{ $_.Split(" `t", $noEmpties)[0] }
[Int64[]]::Sort($Sizes)
(($Sizes[-1..-32] | Measure-Object -Sum).Sum) / 1gb
This is really cool. By type constraining the array I forced all the values to be [Int64]
. No need to convert them on the fly. I then used the static sort method on the [Int[]]
array class which turned out to be faster than Sort-Object
. I did find documentation confirming that too. I believe the array slicing approach is generally faster than Select-Object
, but I found no advantage to replacing Measure-Object
with any kind of manual sum loop.
Note: I suspect the
.Split()
approach will help deal with your other question. Although there may also be a RegEx based approach.
Now in either approach I was able to eek out still more performance by using .SubString()
instead of the Split approach. This is a little tricky because some of the white-space characters are tabs and some are spaces.
[Int64[]]$Sizes = Robocopy /L /E /NDL /NJH /NJS /NP /NC /Bytes C:\temp2 nocopy | ForEach-Object{ $_.Substring(0,14) }
[Int64[]]::Sort($Sizes)
(($Sizes[-1..-32] | Measure-Object -Sum).Sum) / 1gb
There were a couple of seemingly random cases where this seemed not to work, but overall it seemed reliable. If anything you may have to play with the string index referenced. The .split()
approach is more reliable, but I wanted to add this example out if interest on the performance angle.
One final thing; you actually can use Get-ChildItem
((Get-ChildItem \\?\C:\temp2 -File -Recurse |
ForEach-Object{ $_.Length } |
Sort-Object)[-1..-32] |
Measure-Object -Sum).Sum/1gb
However this is considerably slower averaging around 1230 ms over the same set of approximately 5000 files. You can get additional information about the \\?\
prefix syntax around the web here & here are examples.
I agree with Steven in thinking your regex could be simpler and that you probably don't need the first ForEach-Object
in your code.
I cannot try this myself, but perhaps this is a faster alternative for you:
@{ n = 'Top 32 Largest Files Size'; e = {
(robocopy /L /E /NDL /NJH /NJS /NP /NODCOPY /BYTES $_.ContentPath 'NoDestination' | ForEach-Object {
[int64]([regex]'(?i)New File\s*(\d+)').Match($_).Groups[1].Value
} | Sort-Object -Descending | Select-Object -First 32 | Measure-Object -Sum).Sum / 1GB }
}
This doesn't need the if
to check each returned line from robocopy, because lines that do not match the regex will yield a value of 0
Again, sorry I cannot test this myself, but perhaps it would be better to let the servers themselves do the heavy lifting of calculating the sizes. Especially because from your description, I understand that running the code RDP on each server individually works.
In this case, you do need the first ForEach-Object
loop.
Please can you try
$scriptBlock = {
param ([string]$Path)
(robocopy /L /E /NDL /NJH /NJS /NP /NODCOPY /BYTES $Path 'NoDestination' | ForEach-Object {
[int64]([regex]'(?i)New File\s*(\d+)').Match($_).Groups[1].Value
} | Sort-Object -Descending | Select-Object -First 32 | Measure-Object -Sum).Sum / 1GB
}
$results = Get-DfsrMembership | ForEach-Object {
Write-Host "Retrieving Top 32 Largest Files Size from server $($_.ComputerName).."
# get the calculated size from the server
# because of the large number of files, this may take some time..
$size = Invoke-Command -ComputerName $_.ComputerName -ScriptBlock $scriptBlock -ArgumentList $_.ContentPath
[PsCustomObject]@{
'Server - IP' = "$($_.ComputerName) [$((Resolve-DnsName -Name $_.ComputerName -Type A).IPAddress)]"
'Staging Path Quota GB' = ($_.StagingPathQuotaInMB / 1024)
'Top 32 Largest Files Size' = $size
'GroupName' = $_.GroupName
'ContentPath' = $_.ContentPath
'State' = $_.State
}
}
$results | Sort-Object 'Top 32 Largest Files Size'
It is quite possible you need to add parameter -Credential
to the Invoke-Command
cmdlet