I usually read files like this in Python:
f = open(\'filename.txt\', \'r\')
for x in f:
doStuff(x)
f.close()
However, this splits the file
The following function is a fairly straightforward way to do what you want:
def file_split(f, delim=',', bufsize=1024):
prev = ''
while True:
s = f.read(bufsize)
if not s:
break
split = s.split(delim)
if len(split) > 1:
yield prev + split[0]
prev = split[-1]
for x in split[1:-1]:
yield x
else:
prev += s
if prev:
yield prev
You would use it like this:
for item in file_split(open('filename.txt')):
doStuff(item)
This should be faster than the solution that EMS linked, and will save a lot of memory over reading the entire file at once for large files.
Open the file using open()
, then use the file.read(x)
method to read (approximately) the next x
bytes from the file. You could keep requesting blocks of 4096 characters until you hit end-of-file.
You will have to implement the splitting yourself - you can take inspiration from the csv
module, but I don't believe you can use it directly because it wasn't designed to deal with extremely long lines.