I need to read an input text file in python, by streaming line by line. That means load the text file line by line instead of all at once into memory. But my line delimiters
Python doesn't have a native construct for this. You can write a generator that reads the characters one at a time and accumulates them until you have a whole delimited item.
def items(infile, delim):
item = []
c = infile.read(1)
while c:
if c == delim:
yield "".join(item)
item = []
else:
c = infile.read(1)
item.append(c)
yield "".join(item)
with open("log.txt") as infile:
for item in items(infile, ","): # comma delimited
do_something_with(item)
You will get better performance if you read the file in chunks (say, 64K or so) and split these. However, the logic for this is more complicated since an item may be split across chunks, so I won't go into it here as I'm not 100% sure I'd get it right. :-)
import re
def open_delimited(filename, delimiter, chunksize=1024, *args, **kwargs):
with open(filename, *args, **kwargs) as infile:
remainder = ''
for chunk in iter(lambda: infile.read(chunksize), ''):
pieces = re.split(delimiter, remainder+chunk)
for piece in pieces[:-1]:
yield piece
remainder = pieces[-1]
if remainder:
yield remainder
for line in open_delimited("log.txt", delimiter='/'):
print(repr(line))