I\'m using BeautifulSoup (version \'4.3.2\' with Python 3.4) to convert html documents to text. The problem I\'m having is that sometimes web pages have newline characters
get_text might be helpful here:
get_text
>>> from bs4 import BeautifulSoup >>> doc = "This is a paragraph.This is another paragraph." >>> soup = BeautifulSoup(doc) >>> soup.get_text(separator="\n") u'This is a paragraph.\nThis is another paragraph.'
This is a paragraph.
This is another paragraph.