Python - how to separate paragraphs from text?

后端 未结 5 1036
花落未央
花落未央 2020-12-21 13:05

I need to separate texts into paragraphs and be able to work with each of them. How can I do that? Between every 2 paragraphs can be at least 1 empty line. Like this:

<
相关标签:
5条回答
  • 2020-12-21 13:22

    This sould work:

    text.split('\n\n')
    
    0 讨论(0)
  • 2020-12-21 13:25

    this is worked for me:

    text = "".join(text.splitlines())
    text.split('something that is almost always used to separate sentences (i.e. a period, question mark, etc.)')
    
    0 讨论(0)
  • 2020-12-21 13:33

    I usually split then filter out the '' and strip. ;)

    a =\
    '''
    Hello world,
      this is an example.
    
    Let´s program something.
    
    
    Creating  new  program.
    
    
    '''
    
    data = [content.strip() for content in a.splitlines() if content]
    
    print(data)
    
    0 讨论(0)
  • 2020-12-21 13:34

    Try

    result = list(filter(lambda x : x != '', text.split('\n\n')))
    
    0 讨论(0)
  • 2020-12-21 13:42

    Not an entirely trivial problem, and the standard library doesn't seem to have any ready solutions.

    Paragraphs in your example are split by at least two newlines, which unfortunately makes text.split("\n\n") invalid. I think that instead, splitting by regular expressions is a workable strategy:

    import fileinput
    import re
    
    NEWLINES_RE = re.compile(r"\n{2,}")  # two or more "\n" characters
    
    def split_paragraphs(input_text=""):
        no_newlines = input_text.strip("\n")  # remove leading and trailing "\n"
        split_text = NEWLINES_RE.split(no_newlines)  # regex splitting
    
        paragraphs = [p + "\n" for p in split_text if p.strip()]
        # p + "\n" ensures that all lines in the paragraph end with a newline
        # p.strip() == True if paragraph has other characters than whitespace
    
        return paragraphs
    
    # sample code, to split all script input files into paragraphs
    text = "".join(fileinput.input())
    for paragraph in split_paragraphs(text):
        print(f"<<{paragraph}>>\n")
    

    Edited to add:

    It is probably cleaner to use a state machine approach. Here's a fairly simple example using a generator function, which has the added benefit of streaming through the input one line at a time, and not storing complete copies of the input in memory:

    import fileinput
    
    def split_paragraph2(input_lines):
        paragraph = []  # store current paragraph as a list
        for line in input_lines:
            if line.strip():  # True if line is non-empty (apart from whitespace)
                paragraph.append(line)
            elif paragraph:  # If we see an empty line, return paragraph (if any)
                yield "".join(paragraph)
                paragraph = []
        if paragraph:  # After end of input, return final paragraph (if any)
            yield "".join(paragraph)
    
    # sample code, to split all script input files into paragraphs
    for paragraph in split_paragraph2(fileinput.input()):
        print(f"<<{paragraph}>>\n")
    
    0 讨论(0)
提交回复
热议问题