read file into array separated by paragraph Python

后端 未结 7 978
半阙折子戏
半阙折子戏 2021-02-08 22:11

I have a text file, I want to read this text file into 3 different arrays, array1 array2 and array3. the first paragraph gets put in array1, the second paragraph gets put in arr

相关标签:
7条回答
  • 2021-02-08 22:29
    import itertools as it
    
    
    def paragraphs(fileobj, separator='\n'):
        """Iterate a fileobject by paragraph"""
        ## Makes no assumptions about the encoding used in the file
        lines = []
        for line in fileobj:
            if line == separator and lines:
                yield ''.join(lines)
                lines = []
            else:
                lines.append(line)
        yield ''.join(lines)
    
    paragraph_lists = [[], [], []]
    with open('/Users/robdev/Desktop/test.txt') as f:
        paras = paragraphs(f)
        for para, group in it.izip(paras, it.cycle(paragraph_lists)):
            group.append(para)
    
    print paragraph_lists
    
    0 讨论(0)
  • 2021-02-08 22:30

    More elegant way to bypass slices:

    def grouper(n, iterable, fillvalue=None):
        args = [iter(iterable)] * n
        return itertools.izip_longest(fillvalue=fillvalue, *args)
    
    for p in grouper(5,[sent.strip() for sent in text.split('\n') if sent !='']):
        print p
    

    Just make sure you deal with None in final text

    0 讨论(0)
  • 2021-02-08 22:32

    I know this question was asked long before but just putting my inputs so that it will be useful to somebody else at some point of time. I got to know much easier way to split the input file into paragraphs based on the Paragraph Separator(it can be a \n or a blank space or anything else) and the code snippet for your question is given below :

    with open("input.txt", "r") as input:
        input_ = input.read().split("\n\n")   #\n\n denotes there is a blank line in between paragraphs.
    

    And after executing this command, if you try to print input_[0] it will show the first paragraph, input_[1] will show the second paragraph and so on. So it is putting all the paragraphs present in the input file into an List with each List element contains a paragraph from the input file.

    0 讨论(0)
  • 2021-02-08 22:35

    Because I feel like showing off:

    with open('data.txt') as f:
        f = list(f)
        a, b, c = (list(__import__('itertools').islice(f, i, None, 3)) for i in range(3))
    
    0 讨论(0)
  • 2021-02-08 22:35

    This code will search for lines between two points:

    rr = [] #Array for saving lines    
    for f in file_list:
        with open(f, 'rt') as fl:
            lines = fl.read()
            lines = lines[lines.find('String1'):lines.find('String2')] 
            rr.append(lines)
    
    0 讨论(0)
  • 2021-02-08 22:37

    Using slices would also work.

    par_separator = "\n\n"
    paragraphs = "1\n\n2\n\n3\n\n4\n\n5\n\n6".split(par_separator)
    a,b,c = paragraphs[0:len(paragraphs):3], paragraphs[1:len(paragraphs):3],\
            paragraphs[2:len(paragraphs):3] 
    

    Within slice: [start index, end index,step]

    0 讨论(0)
提交回复
热议问题