I have a huge data file with a specific string being repeated after a defined number of lines.
counting jump between first two \'Rank\' occurrences. For example the file
Don't use .readlines()
when a simple generator expression counting the lines with Rank
is enough:
count = sum(1 for l in open(filename) if 'Rank' not in l)
'Rank' not in l
is enough to test if the string 'Rank'
is not present in a string. Looping over the open file is looping over all the lines. The sum()
function will add up all the 1
s, which are generated for each line not containing Rank
, giving you a count of lines without Rank
in them.
If you need to count the lines from Rank
to Rank
, you need a little itertools.takewhile
magic:
import itertools
with open(filename) as f:
# skip until we reach `Rank`:
itertools.takewhile(lambda l: 'Rank' not in l, f)
# takewhile will have read a line with `Rank` now
# count the lines *without* `Rank` between them
count = sum(1 for l in itertools.takewhile(lambda l: 'Rank' not in l, f)
count += 1 # we skipped at least one `Rank` line.