Change delimiter on “for each” loops on Strings in python

前端 未结 2 1627
小鲜肉
小鲜肉 2020-12-03 19:39

I need to read an input text file in python, by streaming line by line. That means load the text file line by line instead of all at once into memory. But my line delimiters

相关标签:
2条回答
  • 2020-12-03 20:09

    Python doesn't have a native construct for this. You can write a generator that reads the characters one at a time and accumulates them until you have a whole delimited item.

    def items(infile, delim):
        item = []
        c = infile.read(1)
        while c:
            if c == delim:
                yield "".join(item)
                item = []
            else:
                c = infile.read(1)
                item.append(c)
        yield "".join(item)
    
    with open("log.txt") as infile:
        for item in items(infile, ","):   # comma delimited
            do_something_with(item)
    

    You will get better performance if you read the file in chunks (say, 64K or so) and split these. However, the logic for this is more complicated since an item may be split across chunks, so I won't go into it here as I'm not 100% sure I'd get it right. :-)

    0 讨论(0)
  • 2020-12-03 20:23
    import re
    def open_delimited(filename, delimiter, chunksize=1024, *args, **kwargs):
        with open(filename, *args, **kwargs) as infile:
            remainder = ''
            for chunk in iter(lambda: infile.read(chunksize), ''):
                pieces = re.split(delimiter, remainder+chunk)
                for piece in pieces[:-1]:
                    yield piece
                remainder = pieces[-1]
            if remainder:
                yield remainder
    
    for line in open_delimited("log.txt", delimiter='/'):
        print(repr(line))
    
    0 讨论(0)
提交回复
热议问题