Make BeautifulSoup handle line breaks as a browser would

后端 未结 2 500
一向
一向 2021-01-17 21:06

I\'m using BeautifulSoup (version \'4.3.2\' with Python 3.4) to convert html documents to text. The problem I\'m having is that sometimes web pages have newline characters

2条回答
  •  囚心锁ツ
    2021-01-17 21:43

    get_text might be helpful here:

    >>> from bs4 import BeautifulSoup
    >>> doc = "

    This is a paragraph.

    This is another paragraph.

    " >>> soup = BeautifulSoup(doc) >>> soup.get_text(separator="\n") u'This is a paragraph.\nThis is another paragraph.'

提交回复
热议问题