What\'s a pythonic approach for reading a line from a file but not advancing where you are in the file?
For example, if you have a file of
cat1
cat2
Manually doing it is not that hard:
f = open('file.txt')
line = f.readline()
print line
>>> cat1
# the calculation is: - (length of string + 1 because of the \n)
# the second parameter is needed to move from the actual position of the buffer
f.seek((len(line)+1)*-1, 1)
line = f.readline()
print line
>>> cat1
You can wrap this in a method like this:
def lookahead_line(file):
line = file.readline()
count = len(line) + 1
file.seek(-count, 1)
return file, line
And use it like this:
f = open('file.txt')
f, line = lookahead_line(f)
print line
Hope this helps!
Solutions with tell()
/seek()
will not work with stdin
and other iterators. More generic implementation can be as simple as this:
class lookahead_iterator(object):
__slots__ = ["_buffer", "_iterator", "_next"]
def __init__(self, iterable):
self._buffer = []
self._iterator = iter(iterable)
self._next = self._iterator.next
def __iter__(self):
return self
def _next_peeked(self):
v = self._buffer.pop(0)
if 0 == len(self._buffer):
self._next = self._iterator.next
return v
def next(self):
return self._next()
def peek(self):
v = next(self._iterator)
self._buffer.append(v)
self._next = self._next_peeked
return v
Usage:
with open("source.txt", "r") as lines:
lines = lookahead_iterator(lines)
magic = lines.peek()
if magic.startswith("#"):
return parse_bash(lines)
if magic.startswith("/*"):
return parse_c(lines)
if magic.startswith("//"):
return parse_cpp(lines)
raise ValueError("Unrecognized file")
As far as I know, there's no builtin functionality for this, but such a function is easy to write, since most Python file
objects support seek
and tell
methods for jumping around within a file. So, the process is very simple:
tell
.read
(or write
) operation of some kind.seek
back to the previous file pointer.This allows you to do nice things like read a chunk of data from the file, analyze it, and then potentially overwrite it with different data. A simple wrapper for the functionality might look like:
def peek_line(f):
pos = f.tell()
line = f.readline()
f.seek(pos)
return line
print peek_line(f) # cat1
print peek_line(f) # cat1
You could implement the same thing for other read
methods just as easily. For instance, implementing the same thing for file.read
:
def peek(f, length=1):
pos = f.tell()
data = f.read(length) # Might try/except this line, and finally: f.seek(pos)
f.seek(pos)
return data
print peek(f, 4) # cat1
print peek(f, 4) # cat1
The more_itertools library offers a peekable
class that allows you to peek()
ahead without advancing an iterable.
with open("file.txt", "r") as f:
p = mit.peekable(f.readlines())
p.peek()
# 'cat1\n'
next(p)
# 'cat1\n'
We can view the next line before calling next()
to advance the iterable p
. We can now view the next line by calling peek()
again.
p.peek()
# 'cat2\n'
See also the more_itertools docs, as peekable
allows you to prepend()
items to an iterable before advancing as well.
You could use wrap the file up with itertools.tee and get back two iterators, bearing in mind the caveats stated in the documentation
For example
from itertools import tee
import contextlib
from StringIO import StringIO
s = '''\
cat1
cat2
cat3
'''
with contextlib.closing(StringIO(s)) as f:
handle1, handle2 = tee(f)
print next(handle1)
print next(handle2)
cat1
cat1