Removing html tags from a text using Regular Expression in python

前端未结

关注

 2  769

感动是毒 2021-01-18 08:55

I\'m trying to look at a html file and remove all the tags from it so that only the text is left but I\'m having a problem with my regex. This is what I have so far.

2条回答

粉色の甜心 (楼主)

2021-01-18 09:43

import re
patjunk = re.compile("<.*?>| |&",re.DOTALL|re.M)
url="http://www.yahoo.com"
def test(url,pat):
    html = urllib2.urlopen(url).read()
    return pat.sub("",html)

print test(url,patjunk)

0 讨论(0)

查看其它2个回答