Strip HTML from strings in Python

前端 未结 26 2353
难免孤独
难免孤独 2020-11-22 02:50
from mechanize import Browser
br = Browser()
br.open(\'http://somewebpage\')
html = br.response().readlines()
for line in html:
  print line

When p

26条回答
  •  挽巷
    挽巷 (楼主)
    2020-11-22 03:05

    Simple code!. This will remove all kind of tags and content inside of it.

    def rm(s):
        start=False
        end=False
        s=' '+s
        for i in range(len(s)-1):
            if i':
                        end=i
                        s=s[:start]+s[end+1:]
                        start=end=False
                else:
                    if s[i]=='<':
                        start=i
        if s.count('<')>0:
            self.rm(s)
        else:
            s=s.replace(' ', ' ')
            return s
    

    But it won't give full result if text contains <> symbols inside it.

提交回复
热议问题