I have a file in which lines are separated using a delimeter say .
. I want to read this file line by line, where lines should be based on presence of .
Here is a more efficient answer, using FileIO
and bytearray
that I used for parsing a PDF file -
import io
import re
# the end-of-line chars, separated by a `|` (logical OR)
EOL_REGEX = b'\r\n|\r|\n'
# the end-of-file char
EOF = b'%%EOF'
def readlines(fio):
buf = bytearray(4096)
while True:
fio.readinto(buf)
try:
yield buf[: buf.index(EOF)]
except ValueError:
pass
else:
break
for line in re.split(EOL_REGEX, buf):
yield line
with io.FileIO("test.pdf") as fio:
for line in readlines(fio):
...
The above example also handles a custom EOF. If you don't want that, use this:
import io
import os
import re
# the end-of-line chars, separated by a `|` (logical OR)
EOL_REGEX = b'\r\n|\r|\n'
def readlines(fio, size):
buf = bytearray(4096)
while True:
if fio.tell() >= size:
break
fio.readinto(buf)
for line in re.split(EOL_REGEX, buf):
yield line
size = os.path.getsize("test.pdf")
with io.FileIO("test.pdf") as fio:
for line in readlines(fio, size):
...
You could use a generator:
def myreadlines(f, newline):
buf = ""
while True:
while newline in buf:
pos = buf.index(newline)
yield buf[:pos]
buf = buf[pos + len(newline):]
chunk = f.read(4096)
if not chunk:
yield buf
break
buf += chunk
with open('file') as f:
for line in myreadlines(f, "."):
print line
The easiest way would be to preprocess the file to generate newlines where you want.
Here's an example using perl (assuming you want the string 'abc' to be the newline):
perl -pe 's/abc/\n/g' text.txt > processed_text.txt
If you also want to ignore the original newlines, use the following instead:
perl -ne 's/\n//; s/abc/\n/g; print' text.txt > processed_text.txt