Strip HTML from strings in Python

前端 未结 26 2338
难免孤独
难免孤独 2020-11-22 02:50
from mechanize import Browser
br = Browser()
br.open(\'http://somewebpage\')
html = br.response().readlines()
for line in html:
  print line

When p

26条回答
  •  栀梦
    栀梦 (楼主)
    2020-11-22 03:07

    Here is a simple solution that strips HTML tags and decodes HTML entities based on the amazingly fast lxml library:

    from lxml import html
    
    def strip_html(s):
        return str(html.fromstring(s).text_content())
    
    strip_html('Ein schöner Text.')  # Output: Ein schöner Text.
    

提交回复
热议问题