The tuples inside the file:
(\'Wanna\', \'O\')
(\'be\', \'O\')
(\'like\', \'O\')
(\'Alexander\', \'B\')
(\'Coughan\', \'I\')
(\'?\', \'O\')
Here's how I'd write this:
from ast import literal_eval
from itertools import tee
def pairwise(iterable): # from itertools recipes
a, b = tee(iterable)
next(b, None)
return zip(a, b)
with open("a.txt") as f:
for p0, p1 in pairwise(map(literal_eval, f)):
if p0[1] == 'B' and p1[1] == 'I':
print(' '.join(p0[0], p1[0]))
break
Here's why:
Your file consists of what appear to be repr
s of Python tuples of two strings. That's a really bad format, and if you can change the way you've stored your data, you should. But if it's too late and you have to parse it, literal_eval is the best answer.
So, we turn each line in the file into a tuple by map
ping literal_eval
over the file.
Then we use pairwise
from the itertools recipes to convert the iterable of tuples into an iterable of adjacent pairs of tuples.
So, now, inside the loop, p0
and p1
will be the tuples from adjacent lines, and you can just write exactly what you described: if p0[1]
is 'B'
and it's followed by (that is, p1[1]
is) 'I'
, join
the two [0]
s.
I'm not sure what you wanted to do with the joined string, so I just printed it out. I'm also not sure if you want to handle multiple values or just the first, so I put in a break
.