Reading a file with a specified delimiter for newline

后端 未结 3 1846
遇见更好的自我
遇见更好的自我 2020-12-01 12:43

I have a file in which lines are separated using a delimeter say .. I want to read this file line by line, where lines should be based on presence of .

相关标签:
3条回答
  • 2020-12-01 13:16

    Here is a more efficient answer, using FileIO and bytearray that I used for parsing a PDF file -

    import io
    import re
    
    
    # the end-of-line chars, separated by a `|` (logical OR)
    EOL_REGEX = b'\r\n|\r|\n'  
    
    # the end-of-file char
    EOF = b'%%EOF'
    
    
    
    def readlines(fio):
        buf = bytearray(4096)
        while True:
            fio.readinto(buf)
            try:
                yield buf[: buf.index(EOF)]
            except ValueError:
                pass
            else:
                break
            for line in re.split(EOL_REGEX, buf):
                yield line
    
    
    with io.FileIO("test.pdf") as fio:
        for line in readlines(fio):
            ...
    

    The above example also handles a custom EOF. If you don't want that, use this:

    import io
    import os
    import re
    
    
    # the end-of-line chars, separated by a `|` (logical OR)
    EOL_REGEX = b'\r\n|\r|\n'  
    
    
    def readlines(fio, size):
        buf = bytearray(4096)
        while True:
            if fio.tell() >= size:
                break               
            fio.readinto(buf)            
            for line in re.split(EOL_REGEX, buf):
                yield line
    
    size = os.path.getsize("test.pdf")
    with io.FileIO("test.pdf") as fio:
        for line in readlines(fio, size):
             ...
    
    0 讨论(0)
  • 2020-12-01 13:22

    You could use a generator:

    def myreadlines(f, newline):
      buf = ""
      while True:
        while newline in buf:
          pos = buf.index(newline)
          yield buf[:pos]
          buf = buf[pos + len(newline):]
        chunk = f.read(4096)
        if not chunk:
          yield buf
          break
        buf += chunk
    
    with open('file') as f:
      for line in myreadlines(f, "."):
        print line
    
    0 讨论(0)
  • 2020-12-01 13:30

    The easiest way would be to preprocess the file to generate newlines where you want.

    Here's an example using perl (assuming you want the string 'abc' to be the newline):

    perl -pe 's/abc/\n/g' text.txt > processed_text.txt
    

    If you also want to ignore the original newlines, use the following instead:

    perl -ne 's/\n//; s/abc/\n/g; print' text.txt > processed_text.txt
    
    0 讨论(0)
提交回复
热议问题