how can i request an html page in an “HTTP request”, and asking only for some regex or specific html tag in it?

可紊 提交于 2020-01-03 05:20:47

问题


is there any header or method in http protocol which allows you to get specific tag from an html resource for example i would like to get all the tag in this python request, instead of all the html page. is there any thing i can do while setting the request which is supported by the http protocol 1.1v or 1.0v ?

import httplib

def printText(txt):
    lines = txt.split('\n')
    for line in lines:
        print line.strip()



httpServ = httplib.HTTPConnection("www.google.com")
httpServ.connect()
httpServ.request('GET',"/search?q=blabla")

response = httpServ.getresponse()
if response.status == httplib.OK:
   printText (response.read())
if response.status != httplib.OK:
   print "NOT OK" ,  response.status
httpServ.close()

回答1:


The HTTP headers let you specify that you want html but don't let you search for a specific part of the tag tree.

If the server accepts ranges, then you can pull down the html in discrete blocks (at byte intervals, but not corresponding to the beginning or end of various tags). Then you can search each block until you find the tag of interest.

Otherwise, you'll likely have to download the whole page and run lmxl, http5lib, or BeautifulSoup on the result.

Good luck with your quest.




回答2:


No, you have to get the entire page. The HTTP protocol does not provide a means of downloading a partial page by HTML element.




回答3:


Although you can't make such a request via http, you can use BeautifulSoup, a python module that will parse the html for you.



来源:https://stackoverflow.com/questions/8684051/how-can-i-request-an-html-page-in-an-http-request-and-asking-only-for-some-re

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!