Don't put html, head and body tags automatically, beautifulsoup

前端 未结 8 1475
青春惊慌失措
青春惊慌失措 2020-12-03 09:40

using beautifulsoup with html5lib, it puts the html, head and body tags automatically:

BeautifulSoup(\'

FOO

\', \'html5lib\') # => <
相关标签:
8条回答
  • 2020-12-03 10:33

    Yet another solution:

    from bs4 import BeautifulSoup
    soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p><p>Hi!</p>', 'lxml')
    # content handling example (just for example)
    # replace Google with StackOverflow
    for a in soup.findAll('a'):
      a['href'] = 'http://stackoverflow.com/'
      a.string = 'StackOverflow'
    print ''.join([unicode(i) for i in soup.html.body.findChildren(recursive=False)])
    
    0 讨论(0)
  • 2020-12-03 10:34

    Since v4.0.1 there's a method decode_contents():

    >>> BeautifulSoup('<h1>FOO</h1>', 'html5lib').decode_contents()
    '<h1>FOO</h1>' 
    

    More details in a solution to this question: https://stackoverflow.com/a/18602241/237105

    0 讨论(0)
提交回复
热议问题