Regex nested parenthesis in python

前端未结

关注

 4  1939

I have something like this:

Othername California (2000) (T) (S) (ok) {state (#2.1)}

Is there a regex code to obtain:

Other


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  臣服心动        
                
              
                            
                2021-01-03 04:00
              
            
            
                                                                       
Despite what I have said in the comments. I've found a way around:

(?(?=\([^()\w]*[\w.]+[^()\w]*\))\([^()\w]*([\w.]+)[^()\w]*\)|.)(?=[^{]*\})|(?<!\()(\b\w+\b)(?!\()|ok


Explanation:

(?                                  # If
(?=\([^()\w]*[\w.]+[^()\w]*\))      # There is (anything except [()\w] zero or more times, followed by [\w.] one or more times, followed by anything except [()\w] zero or more times)
\([^()\w]*([\w.]+)[^()\w]*\)        # Then match it, and put [\w.] in a group
|                                   # else
.                                   # advance with one character
)                                   # End if
(?=[^{]*\})                         # Look ahead if there is anything except { zero or more times followed by }

|                                   # Or
(?<!\()(\b\w+\b)(?!\()              # Match a word not enclosed between parenthesis
|                                   # Or
ok                                  # Match ok


Online demo
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小蘑菇        
                
              
                            
                2021-01-03 04:02
              
            
            
                                                                       
other case is:

^(\w+\s?\w+)\s?\(\d{1,}\)\s?\(\w+\)\s?\(\w+\)\s?\((\w+)\)\s?.*#(\d.\d)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2021-01-03 04:04
              
            
            
                                                                       
Try this one:

import re

thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'

regex = r'''
    ([^(]*)             # match anything but a (
    \                   # a space
    (?:                 # non capturing parentheses
        \([^(]*\)       # parentheses
        \               # a space
    ){3}                # three times
    \(([^(]*)\)         # capture fourth parentheses contents
    \                   # a space
    {                   # opening {
        [^}]*           # anything but }
        \(\#            # opening ( followed by #
            ([^)]*)     # match anything but )
        \)              # closing )
    }                   # closing }
'''

match = re.match(regex, thestr, re.X)

print match.groups()


Output:

('Othername California', 'ok', '2.1')


And here's the compressed version:

import re

thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'
regex = r'([^(]*) (?:\([^(]*\) ){3}\(([^(]*)\) {[^}]*\(\#([^)]*)\)}'
match = re.match(regex, thestr)

print match.groups()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2021-01-03 04:12
              
            
            
                                                                       
Regex

(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}




Text used for test

Name1 Name2 Name3 (2000) {Education (#3.2)}
Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
Name1 Name2 (2002) {edu (#1.1)}
Name1 Name2 Name3 (2000) (V) {variation (#4.12)}
Othername California (2000) (T) (S) (ok) {state (#2.1)}


Test

>>> regex = re.compile("(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x54e2105f36c16a48>
>>> regex.match(string)
<_sre.SRE_Match object at 0x54e2105f36c169e8>

# Run findall
>>> regex.findall(string)
[
   (u'Name1 Name2 Name3'   , u''  , u'3.2'),
   (u'Name1 Name2 Name3'   , u'ok', u'1.1'),
   (u'Name1 Name2'         , u''  , u'1.1'),
   (u'Name1 Name2 Name3'   , u''  , u'4.12'),
   (u'Othername California', u'ok', u'2.1')
]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复