How would one write a regular expression to use in python to split paragraphs?
A paragraph is defined by 2 linebreaks (\\n). But one can have any amount of spaces/ta
Not a regexp but really elegant:
from itertools import groupby
def paragraph(lines) :
for group_separator, line_iteration in groupby(lines.splitlines(True), key = str.isspace) :
if not group_separator :
yield ''.join(line_iteration)
for p in paragraph('p1\n\t\np2\t\n\tstill p2\t \n \n\tp'):
print repr(p)
'p1\n'
'p2\t\n\tstill p2\t \n'
'\tp3'
It's up to you to strip the output as you need it of course.
Inspired from the famous "Python Cookbook" ;-)