Make BeautifulSoup handle line breaks as a browser would

后端未结

关注

 2  503

一向 2021-01-17 21:06

I\'m using BeautifulSoup (version \'4.3.2\' with Python 3.4) to convert html documents to text. The problem I\'m having is that sometimes web pages have newline characters

2条回答

囚心锁ツ (楼主)

2021-01-17 21:43

get_text might be helpful here:

>>> from bs4 import BeautifulSoup
>>> doc = "This is a paragraph.
This is another paragraph."
>>> soup = BeautifulSoup(doc)
>>> soup.get_text(separator="\n")
u'This is a paragraph.\nThis is another paragraph.'

0 讨论(0)

查看其它2个回答