问题
is there any header or method in http protocol which allows you to get specific tag from an html resource for example i would like to get all the tag in this python request, instead of all the html page. is there any thing i can do while setting the request which is supported by the http protocol 1.1v or 1.0v ?
import httplib
def printText(txt):
lines = txt.split('\n')
for line in lines:
print line.strip()
httpServ = httplib.HTTPConnection("www.google.com")
httpServ.connect()
httpServ.request('GET',"/search?q=blabla")
response = httpServ.getresponse()
if response.status == httplib.OK:
printText (response.read())
if response.status != httplib.OK:
print "NOT OK" , response.status
httpServ.close()
回答1:
The HTTP headers let you specify that you want html but don't let you search for a specific part of the tag tree.
If the server accepts ranges, then you can pull down the html in discrete blocks (at byte intervals, but not corresponding to the beginning or end of various tags). Then you can search each block until you find the tag of interest.
Otherwise, you'll likely have to download the whole page and run lmxl, http5lib, or BeautifulSoup on the result.
Good luck with your quest.
回答2:
No, you have to get the entire page. The HTTP protocol does not provide a means of downloading a partial page by HTML element.
回答3:
Although you can't make such a request via http, you can use BeautifulSoup, a python module that will parse the html for you.
来源:https://stackoverflow.com/questions/8684051/how-can-i-request-an-html-page-in-an-http-request-and-asking-only-for-some-re