Regex nested parenthesis in python

前端 未结 4 1939
灰色年华
灰色年华 2021-01-03 03:41

I have something like this:

Othername California (2000) (T) (S) (ok) {state (#2.1)}

Is there a regex code to obtain:

Other         


        
相关标签:
4条回答
  • 2021-01-03 04:00

    Despite what I have said in the comments. I've found a way around:

    (?(?=\([^()\w]*[\w.]+[^()\w]*\))\([^()\w]*([\w.]+)[^()\w]*\)|.)(?=[^{]*\})|(?<!\()(\b\w+\b)(?!\()|ok
    

    Explanation:

    (?                                  # If
    (?=\([^()\w]*[\w.]+[^()\w]*\))      # There is (anything except [()\w] zero or more times, followed by [\w.] one or more times, followed by anything except [()\w] zero or more times)
    \([^()\w]*([\w.]+)[^()\w]*\)        # Then match it, and put [\w.] in a group
    |                                   # else
    .                                   # advance with one character
    )                                   # End if
    (?=[^{]*\})                         # Look ahead if there is anything except { zero or more times followed by }
    
    |                                   # Or
    (?<!\()(\b\w+\b)(?!\()              # Match a word not enclosed between parenthesis
    |                                   # Or
    ok                                  # Match ok
    

    Online demo

    0 讨论(0)
  • 2021-01-03 04:02

    other case is:

    ^(\w+\s?\w+)\s?\(\d{1,}\)\s?\(\w+\)\s?\(\w+\)\s?\((\w+)\)\s?.*#(\d.\d)
    
    0 讨论(0)
  • 2021-01-03 04:04

    Try this one:

    import re
    
    thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'
    
    regex = r'''
        ([^(]*)             # match anything but a (
        \                   # a space
        (?:                 # non capturing parentheses
            \([^(]*\)       # parentheses
            \               # a space
        ){3}                # three times
        \(([^(]*)\)         # capture fourth parentheses contents
        \                   # a space
        {                   # opening {
            [^}]*           # anything but }
            \(\#            # opening ( followed by #
                ([^)]*)     # match anything but )
            \)              # closing )
        }                   # closing }
    '''
    
    match = re.match(regex, thestr, re.X)
    
    print match.groups()
    

    Output:

    ('Othername California', 'ok', '2.1')
    

    And here's the compressed version:

    import re
    
    thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'
    regex = r'([^(]*) (?:\([^(]*\) ){3}\(([^(]*)\) {[^}]*\(\#([^)]*)\)}'
    match = re.match(regex, thestr)
    
    print match.groups()
    
    0 讨论(0)
  • 2021-01-03 04:12

    Regex

    (.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}
    

    Regular expression image

    Text used for test

    Name1 Name2 Name3 (2000) {Education (#3.2)}
    Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
    Name1 Name2 (2002) {edu (#1.1)}
    Name1 Name2 Name3 (2000) (V) {variation (#4.12)}
    Othername California (2000) (T) (S) (ok) {state (#2.1)}
    

    Test

    >>> regex = re.compile("(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}")
    >>> r = regex.search(string)
    >>> r
    <_sre.SRE_Match object at 0x54e2105f36c16a48>
    >>> regex.match(string)
    <_sre.SRE_Match object at 0x54e2105f36c169e8>
    
    # Run findall
    >>> regex.findall(string)
    [
       (u'Name1 Name2 Name3'   , u''  , u'3.2'),
       (u'Name1 Name2 Name3'   , u'ok', u'1.1'),
       (u'Name1 Name2'         , u''  , u'1.1'),
       (u'Name1 Name2 Name3'   , u''  , u'4.12'),
       (u'Othername California', u'ok', u'2.1')
    ]
    
    0 讨论(0)
提交回复
热议问题