Strip HTML from strings in Python

前端未结

关注

 26  2353

难免孤独 2020-11-22 02:50

from mechanize import Browser
br = Browser()
br.open(\'http://somewebpage\')
html = br.response().readlines()
for line in html:
  print line

When p

26条回答

挽巷 (楼主)

2020-11-22 03:05

Simple code!. This will remove all kind of tags and content inside of it.

def rm(s):
    start=False
    end=False
    s=' '+s
    for i in range(len(s)-1):
        if i':
                    end=i
                    s=s[:start]+s[end+1:]
                    start=end=False
            else:
                if s[i]=='<':
                    start=i
    if s.count('<')>0:
        self.rm(s)
    else:
        s=s.replace(' ', ' ')
        return s

But it won't give full result if text contains <> symbols inside it.

0 讨论(0)

查看其它26个回答