Extracting multiple strings using Pythons's regular expression

后端 未结 3 458
北荒
北荒 2021-01-27 21:42

I have a log file having the following output and I have shortened it as it goes to thousands of lines:

Time = 1

smoothSolver:  Solving for Ux, Initial residual         


        
相关标签:
3条回答
  • 2021-01-27 22:18

    Your code is writing over iteration_time in every iteration of the for loop. That is the problem. You will need to stop populating this variable after it has been successfully populated for the first find.

    To do this, in the for loop do a test for iteration_time and only if it is non- existent or None do the regex search for Time. You can do soemthing like this:

    contCumulative_0_out = open('contCumulative_0', 'w+')
    
    with open(logFile, 'r') as logfile_read:
        iteration_time = None
        for line in logfile_read:
            line = line.rstrip()
            time_match = re.findall(r'^Time = ([0-9]+)', line)
            if time_match:
                iteration_time = time_match
                print iteration_time
            else:  # Because if there is time_match, there is no 'cumulative = ...'
                contCumulative_0 = re.search(r'cumulative = ((\d|.)+)', line)
                if contCumulative_0:        
                    cumvalue = contCumulative_0.groups(1)
                    # You can check and use iteration_time here
                    contCumulative_0_out.write('\n'.join(cumvalue))
    

    Hope this helps.

    0 讨论(0)
  • 2021-01-27 22:27

    When there is no 'Time' or 'cumulative' in this line, there is no need to overwrite that variable. You can do this:

    ...
    with open(logFile, 'r') as logfile_read:
    for line in logfile_read:
        line = line.rstrip()
        if 'Time' in line:
            iteration_time = re.findall(r'^Time = ([0-9]+)', line)
            print iteration_time
        if 'cumulative' in line:
            contCumulative_0 = re.search(r'cumulative = ((\d|.)+)', line)
            if contCumulative_0:
                cumvalue = contCumulative_0.groups(1)
                contCumulative_0_out.write('\n'.join(cumvalue))
    ...
    
    0 讨论(0)
  • You can do this with a regex, assuming that your log format is the same for all of your entries. The explanation of what is going on is below:

    import re
    
    s = """Time = 1
    
    smoothSolver:  Solving for Ux, Initial residual = 0.230812, Final residual = 0.0134171, No Iterations 2
    smoothSolver:  Solving for Uy, Initial residual = 0.283614, Final residual = 0.0158797, No Iterations 3
    smoothSolver:  Solving for Uz, Initial residual = 0.190444, Final residual = 0.016567, No Iterations 2
    GAMG:  Solving for p, Initial residual = 0.0850116, Final residual = 0.00375608, No Iterations 3
    time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00142109
    smoothSolver:  Solving for omega, Initial residual = 0.00267604, Final residual = 0.000166675, No Iterations 3
    bounding omega, min: -26.6597 max: 18468.7 average: 219.43
    smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0862096, No Iterations 2
    ExecutionTime = 4.84 s  ClockTime = 5 s
    
    Time = 2
    
    smoothSolver:  Solving for Ux, Initial residual = 0.230812, Final residual = 0.0134171, No Iterations 2
    smoothSolver:  Solving for Uy, Initial residual = 0.283614, Final residual = 0.0158797, No Iterations 3
    smoothSolver:  Solving for Uz, Initial residual = 0.190444, Final residual = 0.016567, No Iterations 2
    GAMG:  Solving for p, Initial residual = 0.0850116, Final residual = 0.00375608, No Iterations 3
    time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00123456
    smoothSolver:  Solving for omega, Initial residual = 0.00267604, Final residual = 0.000166675, No Iterations 3
    bounding omega, min: -26.6597 max: 18468.7 average: 219.43
    smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0862096, No Iterations 2
    ExecutionTime = 4.84 s  ClockTime = 5 s
    """
    
    regex = re.compile("^Time = (\d+?).*?cumulative = (\d{0,10}\.\d{0,10})",re.DOTALL|re.MULTILINE)
    
    for x in re.findall(regex,s):
        print "{} => {}".format(x[0], x[1])
    

    This outputs two results (because I've added two log entries, instead of just the one you provided):

    1 => 0.00142109
    2 => 0.00123456
    

    What is happening?

    The RegEx being utilized is this:

    ^Time = (\d+?).*?cumulative = (\d{0,10}\.\d{0,10})
    

    This Regex is looking for your Time = string at the beginning of the line, and matching the digit that follows. Then it does a non-greedy match to the string cumulative = and captures the digits that follow that. The non-greedy is important, otherwise you'd only get one result in your entire log because it'd match the first instance of Time = and the last instance of cumulative =.

    It then prints each result. Each captured result contains the time value and the cumulative value. This portion of the code can be modified to print to a file if required.

    This regex works across multiple lines because it utilizes two flags: DOTALL and MULTILINE

    0 讨论(0)
提交回复
热议问题