Python urllib2 automatic form filling and retrieval of results

久未见 提交于 2019-12-18 11:56:15

问题


I'm looking to be able to query a site for warranty information on a machine that this script would be running on. It should be able to fill out a form if needed ( like in the case of say HP's service site) and would then be able to retrieve the resulting web page.

I already have the bits in place to parse the resulting html that is reported back I'm just having trouble with what needs to be done in order to do a POST of data that needs to be put in the fields and then being able to retrieve the resulting page.


回答1:


If you absolutely need to use urllib2, the basic gist is this:

import urllib
import urllib2
url = 'http://whatever.foo/form.html'
form_data = {'field1': 'value1', 'field2': 'value2'}
params = urllib.urlencode(form_data)
response = urllib2.urlopen(url, params)
data = response.read()

If you send along POST data (the 2nd argument to urlopen()), the request method is automatically set to POST.

I suggest you do yourself a favor and use mechanize, a full-blown urllib2 replacement that acts exactly like a real browser. A lot of sites use hidden fields, cookies, and redirects, none of which urllib2 handles for you by default, where mechanize does.

Check out Emulating a browser in Python with mechanize for a good example.




回答2:


Using urllib and urllib2 together,

data = urllib.urlencode([('field1',val1), ('field2',val2)]) # list of two-element tuples
content = urllib2.urlopen('post-url', data)

content will give you the page source.




回答3:


I’ve only done a little bit of this, but:

  1. You’ve got the HTML of the form page. Extract the name attribute for each form field you need to fill in.
  2. Create a dictionary mapping the names of each form field with the values you want submit.
  3. Use urllib.urlencode to turn the dictionary into the body of your post request.
  4. Include this encoded data as the second argument to urllib2.Request(), after the URL that the form should be submitted to.

The server will either return a resulting web page, or return a redirect to a resulting web page. If it does the latter, you’ll need to issue a GET request to the URL specified in the redirect response.

I hope that makes some sort of sense?



来源:https://stackoverflow.com/questions/5667699/python-urllib2-automatic-form-filling-and-retrieval-of-results

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!