Ignoring white space for a Regex match

前端 未结 3 846
独厮守ぢ
独厮守ぢ 2020-12-16 22:19

I need to match 8 or more digits, the sequence of which can include spaces.

for example, all of the below would be valid matches.

12345678
1 2345678
         


        
相关标签:
3条回答
  • 2020-12-16 22:41

    Waayy later, but this really needs the correct answer on it, and a reason why. Who knew this question could have such a complex answer, right? Lol. But there are plenty of considerations surrounding spacing in regex.

    Firstly; Never put a space in a regex. Doing so will make your regex unreadable, and unmaintainable. Memories of using a mouse to highlight a space to ensure it was only one space comes to mind. This will break your regex:    , but this won't: [    ], because repetition in a character class is ignored. And if you require an exact number of spaces, you can actually see that in a character class like so: [ ]{3}. Versus accidents without the character class like so:   {3} <-- This is actually looking for 5 spaces, woops!

    Second; Keep the Freespacing (?x) option in mind, which makes your regex commentable and free-spaceable. You shouldn't fear that somebody using that option might break your regex because you decided to put random keyboard spaces in it. Also, (?x) will not ignore the keyboard space when it's inside a character class like so: [ ]. It is therefore safer to use character classes for your keyboard spaces.

    Third; Try not to use \s in this scenario. As Omaghosh points out, it also includes newlines (\r and \n). The scenario you mentioned wouldn't seem to favor that. However, also as Omaghosh points out, you may want more than just keyboard spaces. So you can use either [ ], [\s-[\r\n]], or [\f\t\v\u00A0\u2028\u2029\u0020] depending on what you fancy. The last two in those options are the same thing, but character class subtraction only works in .NET and a couple other weird flavors.

    Fourth; This is a commonly over-built pattern: (\s*...\s*)*. It doesn't make any sense. It is the same as this: (\s*\s*...)* or this: (\s*\s*\s*\s*...)*. Because the pattern is repeating. The only argument against what I'm saying is that you'd be guaranteed to capture the spaces prior to the .... But not once is that ever actually wanted. Worst-case scenario, you might see this: \s*(...\s*)*

    Omaghosh had the closest answer, but this is the shortest correct answer:

    Regex.Match(input, @"(?:\d[ ]*){8,}").Groups[0].Value;
    

    Or the following, if we take the question literally that the six options are in the same text on multiple lines:

    Regex.Match(input, @"(?m)^(?:\d[ ]*){8,}$").Groups[0].Value;
    

    Or the following, if it is part of a bigger regex and needs a group:

    Regex.Match(input, @"...((?:\d[ ]*){8,})...").Groups[1].Value;
    

    And feel free to replace the [ ] with a .NET Class Subtraction, or a Non-.NET explicit whitespace class:

    @"(?:\d[\s-[\r\n]]*){8,}"
    // Or . . .
    @"(?:\d[\f\t\v\u00A0\u2028\u2029\u0020]*){8,}"
    
    0 讨论(0)
  • 2020-12-16 22:43
    (\d *){8,}
    

    It matches eight or more occurrences of a digit followed by zero or more spaces. Change it to

    ( *\d *){8,}  #there is a space before first asterik
    

    to match strings with spaces in the beginning. Or

    (\s*\d\s*){8,}
    

    to match tabs and other white space characters (that includes newlines too).

    Finally, make it a non-capturing group with ?:. Thus it becomes (?:\s*\d\s*){8,}

    0 讨论(0)
  • 2020-12-16 22:49
    (\d{8,}\s+)*\d{8,}
    

    should work

    0 讨论(0)
提交回复
热议问题