Strange result from String.Split()

前端 未结 4 721
半阙折子戏
半阙折子戏 2021-01-15 02:59

Why does the following result in an array with 7 elements with 5 blank? I\'d expect only 2 elements. Where are the 5 blank elements coming from?

$a = \'OU=RA         


        
相关标签:
4条回答
  • 2021-01-15 03:29

    String.Split() is character oriented. It splits on O, U, = as three separate places.

    Think of it as intending to be used for 1,2,3,4,5. If you had ,2,3,4, it would imply there were empty spaces at the start and end. If you had 1,2,,,5 it would imply two empty spaces in the middle.

    You can see with something like:

    PS C:\> $a = 'OU=RAH,OU=RAC'
    PS C:\> $a.Split('RAH')
    OU=
    
    
    ,OU=
    
    C
    

    The spaces are R_A_H and R_A. Split on the end of a string, it introduces blanks at the start/end.

    PowerShell's -split operator is string oriented.

    PS D:\t> $a = 'OU=RAH,OU=RAC'
    
    PS D:\t> $a -split 'OU='
    
    RAH,
    RAC
    

    You might do better to split on the comma, then replace out OU=, or vice versa, e.g.

    PS D:\t> $a = 'OU=RAH,OU=RAC'
    
    PS D:\t> $a.Replace('OU=','').Split(',')
    RAH
    RAC
    
    0 讨论(0)
  • 2021-01-15 03:31

    It splits the string for each character in the separator. So its splitting it on 'O', 'U' & '='.

    As @mklement0 has commented, my earlier answer would not work in all cases. So here is an alternate way to get the expected items.

    $a.Split(',') |% { $_.Split('=') |? { $_ -ne 'OU' } }
    

    This code will split the string, first on , then each item will be split on = and ignore the items that are OU, eventually returning the expected values:

    RAH
    RAC
    

    This will work even in case of:

    $a = 'OU=FOO,OU=RAH,OU=RAC'
    

    generating 3 items FOO, RAH & RAC

    To get only 2 string as expected you could use following line: $a.Split('OU=', [System.StringSplitOptions]::RemoveEmptyEntries) Which will give output as: RAH, RAC And if you use (note the comma in the separator) $a.Split(',OU=', [System.StringSplitOptions]::RemoveEmptyEntries) you will get RAH RAC

    This is probably what you want. :)

    0 讨论(0)
  • 2021-01-15 03:36

    Never mind. Just realised it looks for strings on either side of 'O', 'U', and '='. There are therefore 5 blank chars (in front of the first 'O', between 'O' and 'U', between 'U' and '=', between the second 'O' and 'U', between the second 'U' and '=').

    0 讨论(0)
  • 2021-01-15 03:43

    In order to split by strings (rather than a set of characters) and/or regular expressions, use PowerShell's -split operator:

    PS> ('OU=RAH,OU=RAC' -split ',?OU=') -ne ''  # parentheses not strictly needed
    RAH
    RAC
    
    • -split by default interprets its RHS as a regular expression, and ,?OU= matches both OU by itself and ,OU, resulting in the desired splitting, returning the tokens as an array.

      • For all features supported by -split, including literal string matching, limiting the number of tokens returned, and use of script blocks, see Get-Help about_split.
    • Since the input starts with a match, however, -split considers the first element of the split to be the empty string. By passing the resulting array of tokens to -ne '', we filter out these empty strings.


    By contrast, in Windows PowerShell use of the .NET (FullCLR, up to 4.x) String.Split() method, as you've tried, works very differently:

    'OU=RAH,OU=RAC'.Split('OU=')
    

    OU= is interpreted as an array of characters, any of which, individually acts as separator - irrespective of the order in which the characters are specified. Leading, adjacent, and trailing separators are by default considered to separate empty tokens, so you get an array of 7 tokens:
    @( '', '', '', 'RAH,', '', '', 'RAC')

    Note to PowerShell Core users (PowerShell versions 6 and above): The .NET Core String.Split() method now does have a scalar [string] overload that looks for an entire string as the separator, which PowerShell Core selects by default; to get the character-array behavior described, you must cast to [char[]] explicitly:
    'OU=RAH,OU=RAC'.Split([char[]] 'OU=')


    If you construct the .Split() method call carefully, you can specify strings, but note that you still don't get regular-expression support:

    PS> 'OU=RAH,OU=RAC'.split([string[]] 'OU=', 'RemoveEmptyEntries')
    RAH,
    RAC
    

    works to split by literal string OU=, removing empty entries, but as you can see, that doesn't allow you to account for the ,

    You can take this further by specifying an array of strings to split by, which works in this simple case, but ultimately doesn't give you the same flexibility as the regular expressions that PowerShell's -split operator provides:

    PS> 'OU=RAH,OU=RAC'.split([string[]] ('OU=', ',OU='), 'RemoveEmptyEntries')
    RAH
    RAC
    

    Note that specifying an (array of) strings requires the 2-argument form of the method call, meaning you must also specify a System.StringSplitOptions enumeration value. Use 'None' to not apply any options (as of this writing, the only true option that is supported is 'RemoveEmptyEntries', as used above).
    (The type-safe way to specify option is to use, e.g., [System.StringSplitOptions]::None, however, passing the option name as a string is a convenient shortcut; e.g., 'None'.)

    0 讨论(0)
提交回复
热议问题