Extracting multiple strings using Pythons's regular expression

后端未结

关注

 3  458

I have a log file having the following output and I have shortened it as it goes to thousands of lines:

Time = 1

smoothSolver:  Solving for Ux, Initial residual


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2021-01-27 22:18
              
            
            
                                                                       
Your code is writing over iteration_time in every iteration of the for loop. That is the problem. You will need to stop populating this variable after it has been successfully populated for the first find.

To do this, in the for loop do a test for iteration_time and only if it is non- existent or None do the regex search for Time. You can do soemthing like this:

contCumulative_0_out = open('contCumulative_0', 'w+')

with open(logFile, 'r') as logfile_read:
    iteration_time = None
    for line in logfile_read:
        line = line.rstrip()
        time_match = re.findall(r'^Time = ([0-9]+)', line)
        if time_match:
            iteration_time = time_match
            print iteration_time
        else:  # Because if there is time_match, there is no 'cumulative = ...'
            contCumulative_0 = re.search(r'cumulative = ((\d|.)+)', line)
            if contCumulative_0:        
                cumvalue = contCumulative_0.groups(1)
                # You can check and use iteration_time here
                contCumulative_0_out.write('\n'.join(cumvalue))


Hope this helps.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2021-01-27 22:27
              
            
            
                                                                       
When there is no 'Time' or 'cumulative' in this line, there is no need to overwrite that variable. You can do this:

...
with open(logFile, 'r') as logfile_read:
for line in logfile_read:
    line = line.rstrip()
    if 'Time' in line:
        iteration_time = re.findall(r'^Time = ([0-9]+)', line)
        print iteration_time
    if 'cumulative' in line:
        contCumulative_0 = re.search(r'cumulative = ((\d|.)+)', line)
        if contCumulative_0:
            cumvalue = contCumulative_0.groups(1)
            contCumulative_0_out.write('\n'.join(cumvalue))
...

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2021-01-27 22:29
              
            
            
                                                                       
You can do this with a regex, assuming that your log format is the same for all of your entries. The explanation of what is going on is below:

import re

s = """Time = 1

smoothSolver:  Solving for Ux, Initial residual = 0.230812, Final residual = 0.0134171, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.283614, Final residual = 0.0158797, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.190444, Final residual = 0.016567, No Iterations 2
GAMG:  Solving for p, Initial residual = 0.0850116, Final residual = 0.00375608, No Iterations 3
time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00142109
smoothSolver:  Solving for omega, Initial residual = 0.00267604, Final residual = 0.000166675, No Iterations 3
bounding omega, min: -26.6597 max: 18468.7 average: 219.43
smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0862096, No Iterations 2
ExecutionTime = 4.84 s  ClockTime = 5 s

Time = 2

smoothSolver:  Solving for Ux, Initial residual = 0.230812, Final residual = 0.0134171, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.283614, Final residual = 0.0158797, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.190444, Final residual = 0.016567, No Iterations 2
GAMG:  Solving for p, Initial residual = 0.0850116, Final residual = 0.00375608, No Iterations 3
time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00123456
smoothSolver:  Solving for omega, Initial residual = 0.00267604, Final residual = 0.000166675, No Iterations 3
bounding omega, min: -26.6597 max: 18468.7 average: 219.43
smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0862096, No Iterations 2
ExecutionTime = 4.84 s  ClockTime = 5 s
"""

regex = re.compile("^Time = (\d+?).*?cumulative = (\d{0,10}\.\d{0,10})",re.DOTALL|re.MULTILINE)

for x in re.findall(regex,s):
    print "{} => {}".format(x[0], x[1])




This outputs two results (because I've added two log entries, instead of just the one you provided):

1 => 0.00142109
2 => 0.00123456




What is happening?

The RegEx being utilized is this:

^Time = (\d+?).*?cumulative = (\d{0,10}\.\d{0,10})


This Regex is looking for your Time = string at the beginning of the line, and matching the digit that follows. Then it does a non-greedy match to the string cumulative = and captures the digits that follow that. The non-greedy is important, otherwise you'd only get one result in your entire log because it'd match the first instance of Time = and the last instance of cumulative =.

It then prints each result. Each captured result contains the time value and the cumulative value. This portion of the code can be modified to print to a file if required.

This regex works across multiple lines because it utilizes two flags: DOTALL and MULTILINE
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复