问题
I'm looking to be able to query a site for warranty information on a machine that this script would be running on. It should be able to fill out a form if needed ( like in the case of say HP's service site) and would then be able to retrieve the resulting web page.
I already have the bits in place to parse the resulting html that is reported back I'm just having trouble with what needs to be done in order to do a POST of data that needs to be put in the fields and then being able to retrieve the resulting page.
回答1:
If you absolutely need to use urllib2, the basic gist is this:
import urllib
import urllib2
url = 'http://whatever.foo/form.html'
form_data = {'field1': 'value1', 'field2': 'value2'}
params = urllib.urlencode(form_data)
response = urllib2.urlopen(url, params)
data = response.read()
If you send along POST data (the 2nd argument to urlopen()
), the request method is automatically set to POST.
I suggest you do yourself a favor and use mechanize, a full-blown urllib2 replacement that acts exactly like a real browser. A lot of sites use hidden fields, cookies, and redirects, none of which urllib2 handles for you by default, where mechanize does.
Check out Emulating a browser in Python with mechanize for a good example.
回答2:
Using urllib and urllib2 together,
data = urllib.urlencode([('field1',val1), ('field2',val2)]) # list of two-element tuples
content = urllib2.urlopen('post-url', data)
content will give you the page source.
回答3:
I’ve only done a little bit of this, but:
- You’ve got the HTML of the form page. Extract the
name
attribute for each form field you need to fill in. - Create a dictionary mapping the names of each form field with the values you want submit.
- Use
urllib.urlencode
to turn the dictionary into the body of your post request. - Include this encoded data as the second argument to
urllib2.Request()
, after the URL that the form should be submitted to.
The server will either return a resulting web page, or return a redirect to a resulting web page. If it does the latter, you’ll need to issue a GET
request to the URL specified in the redirect response.
I hope that makes some sort of sense?
来源:https://stackoverflow.com/questions/5667699/python-urllib2-automatic-form-filling-and-retrieval-of-results