Counting jump(no of lines) between first two 'String' occurrences in a file

前端 未结 4 1061
天命终不由人
天命终不由人 2021-01-22 09:27

I have a huge data file with a specific string being repeated after a defined number of lines.

counting jump between first two \'Rank\' occurrences. For example the file

相关标签:
4条回答
  • 2021-01-22 09:45

    I assume you want to find the number of lines in a block where each block starts with a line that contains 'Rank' e.g., there are 3 blocks in your sample: 1st has 4 lines, 2nd has 4 lines, 3rd has 1 line:

    from itertools import groupby
    
    def block_start(line, start=[None]):
        if 'Rank' in line:
           start[0] = not start[0]
        return start[0]
    
    with open(filename) as file:
         block_sizes = [sum(1 for line in block) # find number of lines in a block
                        for _, block in groupby(file, key=block_start)] # group
    print(block_sizes)
    # -> [4, 4, 1]
    

    If all blocks have the same number of lines or you just want to find number of lines in the first block that starts with 'Rank':

    count = None
    with open(filename) as file:
         for line in file:
             if 'Rank' in line:
                 if count is None: # found the start of the 1st block
                    count = 1
                 else: # found the start of the 2nd block
                    break
             elif count is not None: # inside the 1st block
                 count += 1
    print(count) # -> 4
    
    0 讨论(0)
  • 2021-01-22 09:57

    counting jump between first two 'Rank' occurrences:

    def find_jumps(filename):
        first = True
        count = 0
        with open(filename) as f:
            for line in f:
                if 'Rank' in line:
                    if first:
                        count = 0 
                        #set this to 1 if you want to include one of the 'Rank' lines.
                        first = False                    
                    else:
                        return count
                else:
                    count += 1 
    
    0 讨论(0)
  • 2021-01-22 10:07

    Don't use .readlines() when a simple generator expression counting the lines with Rank is enough:

    count = sum(1 for l in open(filename) if 'Rank' not in l)
    

    'Rank' not in l is enough to test if the string 'Rank' is not present in a string. Looping over the open file is looping over all the lines. The sum() function will add up all the 1s, which are generated for each line not containing Rank, giving you a count of lines without Rank in them.

    If you need to count the lines from Rank to Rank, you need a little itertools.takewhile magic:

    import itertools
    with open(filename) as f:
        # skip until we reach `Rank`:
        itertools.takewhile(lambda l: 'Rank' not in l, f)
        # takewhile will have read a line with `Rank` now
        # count the lines *without* `Rank` between them
        count = sum(1 for l in itertools.takewhile(lambda l: 'Rank' not in l, f)
        count += 1  # we skipped at least one `Rank` line.
    
    0 讨论(0)
  • 2021-01-22 10:08

    7 line of codes:

    count = 0
    for line in open("yourfile.txt"):
        if "Rank" in line: 
            count += 1
            if count > 1: break 
        elif count > 0: count += 1
    print count
    
    0 讨论(0)
提交回复
热议问题