python regular expression to split paragraphs

前端 未结 5 536
执笔经年
执笔经年 2021-01-19 01:40

How would one write a regular expression to use in python to split paragraphs?

A paragraph is defined by 2 linebreaks (\\n). But one can have any amount of spaces/ta

5条回答
  •  佛祖请我去吃肉
    2021-01-19 02:04

    Not a regexp but really elegant:

    from itertools import groupby
    
    def paragraph(lines) :
        for group_separator, line_iteration in groupby(lines.splitlines(True), key = str.isspace) :
            if not group_separator :
                yield ''.join(line_iteration)
    
    for p in paragraph('p1\n\t\np2\t\n\tstill p2\t   \n     \n\tp'): 
        print repr(p)
    
    'p1\n'
    'p2\t\n\tstill p2\t   \n'
    '\tp3'
    

    It's up to you to strip the output as you need it of course.

    Inspired from the famous "Python Cookbook" ;-)

提交回复
热议问题