Searching through webpage

浪子不回头ぞ 提交于 2020-01-22 10:06:25

问题


Hey I'm working on a Python project that requires I look through a webpage. I want to look through to find a specific text and if it finds the text, then it prints something out. If not, it prints out an error message. I've already tried with different modules such as libxml but I can't figure out how I would do it.

Could anybody lend some help?


回答1:


You could do something simple like:


import urllib2
import re

html_content = urllib2.urlopen('http://www.domain.com').read()

matches = re.findall('regex of string to find', html_content);

if len(matches) == 0: 
   print 'I did not find anything'
else:
   print 'My string is in the html'



回答2:


lxml is awesome: http://lxml.de/parsing.html

I use it regularly with xpath for extracting data from the html.

The other option is http://www.crummy.com/software/BeautifulSoup/ which is great as well.



来源:https://stackoverflow.com/questions/4925966/searching-through-webpage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!