I need to separate texts into paragraphs and be able to work with each of them. How can I do that? Between every 2 paragraphs can be at least 1 empty line. Like this:
<This sould work:
text.split('\n\n')
this is worked for me:
text = "".join(text.splitlines())
text.split('something that is almost always used to separate sentences (i.e. a period, question mark, etc.)')
I usually split then filter out the '' and strip. ;)
a =\
'''
Hello world,
this is an example.
Let´s program something.
Creating new program.
'''
data = [content.strip() for content in a.splitlines() if content]
print(data)
Try
result = list(filter(lambda x : x != '', text.split('\n\n')))
Not an entirely trivial problem, and the standard library doesn't seem to have any ready solutions.
Paragraphs in your example are split by at least two newlines, which unfortunately makes text.split("\n\n")
invalid. I think that instead, splitting by regular expressions is a workable strategy:
import fileinput
import re
NEWLINES_RE = re.compile(r"\n{2,}") # two or more "\n" characters
def split_paragraphs(input_text=""):
no_newlines = input_text.strip("\n") # remove leading and trailing "\n"
split_text = NEWLINES_RE.split(no_newlines) # regex splitting
paragraphs = [p + "\n" for p in split_text if p.strip()]
# p + "\n" ensures that all lines in the paragraph end with a newline
# p.strip() == True if paragraph has other characters than whitespace
return paragraphs
# sample code, to split all script input files into paragraphs
text = "".join(fileinput.input())
for paragraph in split_paragraphs(text):
print(f"<<{paragraph}>>\n")
Edited to add:
It is probably cleaner to use a state machine approach. Here's a fairly simple example using a generator function, which has the added benefit of streaming through the input one line at a time, and not storing complete copies of the input in memory:
import fileinput
def split_paragraph2(input_lines):
paragraph = [] # store current paragraph as a list
for line in input_lines:
if line.strip(): # True if line is non-empty (apart from whitespace)
paragraph.append(line)
elif paragraph: # If we see an empty line, return paragraph (if any)
yield "".join(paragraph)
paragraph = []
if paragraph: # After end of input, return final paragraph (if any)
yield "".join(paragraph)
# sample code, to split all script input files into paragraphs
for paragraph in split_paragraph2(fileinput.input()):
print(f"<<{paragraph}>>\n")