What does .* do in regex?

前端 未结 3 1534
余生分开走
余生分开走 2021-02-10 06:52

After extensive search, I am unable to find an explanation for the need to use .* in regex. For example, MSDN suggests a password regex of

@\\\"(?=.{6,})(?=(.*\\         


        
相关标签:
3条回答
  • 2021-02-10 07:32

    .* just means "0 or more of any character"

    It's broken down into two parts:

    • . - a "dot" indicates any character
    • * - means "0 or more instances of the preceding regex token"

    In your example above, this is important, since they want to force the password to contain a special character and a number, while still allowing all other characters. If you used \d instead of .*, for example, then that would restrict that portion of the regex to only match decimal characters (\d is shorthand for [0-9], meaning any decimal). Similarly, \W instead of .*\W would cause that portion to only match non-word characters.

    A good reference containing many of these tokens for .NET can be found on the MSDN here: Regular Expression Language - Quick Reference

    Also, if you're really looking to delve into regex, take a look at http://www.regular-expressions.info/. While it can sometimes be difficult to find what you're looking for on that site, it's one of the most complete and begginner-friendly regex references I've seen online.

    0 讨论(0)
  • 2021-02-10 07:38

    The .* portion just allows for literally any combination of characters to be entered. It's essentially allowing for the user to add any level of extra information to the password on top of the data you are requiring

    Note: I don't think that MSDN page is actually suggesting that as a password validator. It is just providing an example of a possible one.

    0 讨论(0)
  • 2021-02-10 07:47

    Just FYI, that regex doesn't do what they say it does, and the way it's written is needlessly verbose and confusing. They say it's supposed to match more than seven characters, but it really matches as few as six. And while the other two lookaheads correctly match at least one each of the required character types, they can be written much more simply.

    Finally, the string you copied isn't just a regex, it's an XML attribute value (including the enclosing quotes) that seems to represent a C# string literal (except the closing quote is missing). I've never used a Membership object, but I'm pretty sure that syntax is faulty. In any case, the actual regex is:

    (?=.{6,})(?=(.*\d){1,})(?=(.*\W){1,})
    

    ..but it should be:

    (?=.{8,})(?=.*\d)(?=.*\W)
    

    The first lookahead tries to match eight or more of any characters. If it succeeds, the match position (or cursor, if you prefer) is reset to the beginning and the second lookahead scans for a digit. If it finds one, the cursor is reset again and the third lookahead scans for a special character. (Which, by the way, includes whitespace, control characters, and a boatload of other esoteric characters; probably not what the author intended.)

    If you left the .* out of the latter two lookaheads, you would have (?=\d) asserting that the first character is a digit, and (?=\W) asserting that it's not a digit. (Digits are classed as word characters, and \W matches anything that's not a word character.) The .* in each lookahead causes it to initially gobble up the whole string, then backtrack, giving back one character at a time until it reaches a spot where the \d or \W can match. That's how they can match the digit and the special character anywhere in the string.

    0 讨论(0)
提交回复
热议问题