lookahead and non-capturing regular expressions

后端 未结 1 1353
无人及你
无人及你 2021-01-14 07:10

I\'m trying to match the local part of an email address before the @ character with:

LOCAL_RE_NOTQUOTED = \"\"\"
((
\\w         # alphanumeric and _
| [!#$%&         


        
相关标签:
1条回答
  • 2021-01-14 07:17

    You're confusing non-capturing groups (?:...) and lookahead assertions (?=...).

    The former do participate in the match (and are thus part of match.group() which contains the overall match), they just don't generate a backreference ($1 etc. for later use).

    The second problem (Why is the double dot matched?) is a bit trickier. This is because of an error in your regex. You see, when you wrote (shortened to make the point)

    [+-/]
    

    you wrote "Match a character between + and /, and in ASCII, the dot is right between them (ASCII 43-47: +,-./). Therefore, the first character class matches the dot, and the lookahead assertion is never reached. You need to place the dash at the end of the character class to treat it as a literal dash:

    ((
    \w         # alphanumeric and _
    | [!#$%&'*+/=?^_`{|}~-]          # special chars, but no dot at beginning
    )
    (
    \w         # alphanumeric and _
    | [!#$%&'*+/=?^_`{|}~-]          # special characters
    | ([.](?![.])) # negative lookahead to avoid pairs of dots. 
    )*)
    (?<!\.)(?=@)           # no end with dot before @
    

    And of course, if you want to use this logic, you can streamline it a bit:

    ^(?!\.)                   # no dot at the beginning
    (?:
    [\w!#$%&'*+/=?^_`{|}~-]   # alnums or special characters except dot
    | (\.(?![.@]))            # or dot unless it's before a dot or @ 
    )*
    (?=@)                     # end before @
    
    0 讨论(0)
提交回复
热议问题