Slice a PowerShell array into groups of smaller arrays

核能气质少年 提交于 2019-12-04 14:42:33

You can use ,$x instead of just $x.

The about_Operators section in the documentation has this:

, Comma operator                                                  
   As a binary operator, the comma creates an array. As a unary
   operator, the comma creates an array with one member. Place the
   comma before the member.

For the sake of completeness:

function Slice-Array
{

    [CmdletBinding()]
    param (
        [Parameter(Mandatory=$true, Position=0, ValueFromPipeline=$True)]
        [String[]]$Item,
        [int]$Size=10
    )
    BEGIN { $Items=@()}
    PROCESS {
        foreach ($i in $Item ) { $Items += $i }
    }
    END {
        0..[math]::Floor($Items.count/$Size) | ForEach-Object { 
            $x, $Items = $Items[0..($Size-1)], $Items[$Size..$Items.Length]; ,$x
        } 
    }
}

Usage:

@(0,1,2,3,4,5,6,7,8,9) | Slice-Array -Size 3 | ForEach-Object { "IDs: $($_ -Join ",")" }
cls
$ids=@(0,1,2,3,4,5,6,7,8,9)
$size=3

<# 
Manual Selection:
    $ids | Select-Object -First 3 -Skip 0
    $ids | Select-Object -First 3 -Skip 3
    $ids | Select-Object -First 3 -Skip 6
    $ids | Select-Object -First 3 -Skip 9
#>

# Select via looping
$idx = 0
while ($($size * $idx) -lt $ids.Length){

    $group = $ids | Select-Object -First $size -skip ($size * $idx)
    $group -join ","
    $idx ++
} 

To add an explanation to Bill Stewart's effective solution:

Outputting a collection such as an array[1] either implicitly or using return sends its elements individually through the pipeline; that is, the collection is enumerated (unrolled):

# Count objects received.
PS> (1..3  | Measure-Object).Count
3   # Array elements were sent *individually* through the pipeline.

Using the unary form of , (comma; the array-construction operator) to prevent enumeration is a conveniently concise, though somewhat obscure workaround:

PS> (, (1..3) | Measure-Object).Count 
1   # By wrapping the array in a helper array, the original array was preserved.

That is, , <collection> creates a transient single-element helper array around the original collection so that the enumeration is only applied to the helper array, outputting the enclosed original collection as-is, as a single object.

A conceptually clearer, but more verbose and slower approach is to use Write-Output -NoEnumerate, which clearly signals the intent to output a collection as a single object.

PS> (Write-Output -NoEnumerate (1..3) | Measure-Object).Count 
1   # Write-Output -NoEnumerate prevented enumeration.

Pitfall with respect to visual inspection:

On outputting for display, the boundaries between multiple arrays are seemingly erased again:

PS> (1..2), (3..4) # Output two arrays without enumeration
1
2
3
4

That is, even though two 2-element arrays were each sent as a single object each, the output, through showing elements each on their own line, makes it look like a flat 4-element array was received.

A simple way around that is to stringify each array, which turns each array into a string containing a space-separated list of its elements.

PS> (1..2), (3..4) | ForEach-Object { "$_" }
1 2
3 4

Now it is obvious that two separate arrays were received.


[1] What data types are enumerated:
Instances of data types that implement the IEnumerable interface are automatically enumerated, but there are exceptions:
Types that also implement IDictionary, such as hashtables, are not enumerated, and neither are XmlNode instances.
Conversely, instances of DataTable (which doesn't implement IEnumerable) are enumerated (as the elements of their .Rows collection) - see the source code
Additionally, note that stdout output from external program is enumerated line by line.

Craig himself has conveniently wrapped the splitting (partitioning) functionality in a robust function:

Let me offer a better-performing evolution of it (PSv3+ syntax, renamed to Split-Array), which:

  • more efficiently collects the input objects using an extensible System.Collections.Generic.List[object]] collection.

  • doesn't modify the collection during splitting, and instead extracts ranges of elements from it.

function Split-Array {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory, ValueFromPipeline)]
        [String[]] $InputObject
        ,
        [ValidateRange(1, [int]::MaxValue)]
        [int] $Size = 10
    )
    begin   { $items = New-Object System.Collections.Generic.List[object] }
    process { $items.AddRange($InputObject) }
    end {
      $chunkCount = [Math]::Floor($items.Count / $Size)
      foreach ($chunkNdx in 0..($chunkCount-1)) {
        , $items.GetRange($chunkNdx * $Size, $Size).ToArray()
      }
      if ($chunkCount * $Size -lt $items.Count) {
        , $items.GetRange($chunkCount * $Size, $items.Count - $chunkCount * $Size).ToArray()
      }
    }
}

With small input collections, the optimization won't matter much, but once you get into the thousands of elements, the speed-up can be dramatic:

To give a rough sense of the performance improvement, using Time-Command:

$ids = 0..1e4 # 10,000 numbers
$size = 3 # chunk size

Time-Command { $ids | Split-Array -size $size }, # optimized
             { $ids | Slice-Array -size $size }  # original

Sample result from a single-core Windows 10 VM with Windows 5.1 (the absolute times aren't important, but the factors are):

Command                        Secs (10-run avg.) TimeSpan         Factor
-------                        ------------------ --------         ------
$ids | Split-Array -size $size 0.150              00:00:00.1498207 1.00
$ids | Slice-Array -size $size 10.382             00:00:10.3820590 69.30

Note how the unoptimized function was almost 70 times slower.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!