问题
I have a problem similar to Selecting an unnamed text field in a mechanize form (python) and Use mechanize to submit form without control name .
I want to scrape the data of a website behind a login screen. However, I don't know how to select a form field that does not have a name. The controls look like this:
<TextControl(<None>=)>
<PasswordControl(<None>=)>
<CheckboxControl(<None>=[on])>
<SubmitButtonControl(<None>=) (readonly)>>
Usually it says <TextControl(login=)>
, so I can use br.form['login'] = 'mylogin'
But this time I can't, since I don't know the name of the login field.
I'm able to access the form, but cannot fill out the TextControl or PasswordControl due to the value I guess. My basic code looks like this:
import mechanize
from bs4 import BeautifulSoup
import urllib2
import cookielib
cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_cookiejar(cj)
br.set_handle_robots(False)
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
url = "www.example.com"
request = urllib2.Request(url, None, hdr)
response = br.open(request)
forms = [form for form in br.forms()][0]
br.select_form(nr=0)
I tried stuff like this:
br.form.find_control(id="id").value = "loginname"
and this:
forms[0].set_value("new value", nr=0)
This throws errors such as mechanize._response.httperror_seek_wrapper: HTTP Error 403: Forbidden
or TypeError: control name must be string-like
. I don't know what else to try. Please help me out here.
回答1:
According to your code:
url = "www.example.com"
request = urllib2.Request(url, None, hdr)
response = br.open(request)
forms = [form for form in br.forms()][0]
br.select_form(nr=0)
Following this:
aux = 0
for f in br.form.controls:
print f,
print ' ---> Number: ',
print aux
aux = aux + 1
The results is:
<TextControl(<None>=)> ---> Number: 0
<PasswordControl(<None>=)> ---> Number: 1
<CheckboxControl(<None>=[on])> ---> Number: 2
<SubmitButtonControl(<None>=) (readonly)> ---> Number: 3
Now, you can try this:
br.form.controls[0]._value = "loginname"
br.form.controls[1]._value = "password"
So:
for f in br.form.controls:
print f
The results will be:
<TextControl(<None>=loginname)>
<PasswordControl(<None>=password)>
<CheckboxControl(<None>=[on])>
<SubmitButtonControl(<None>=) (readonly)>
来源:https://stackoverflow.com/questions/27486361/mechanizer-in-python-selecting-form-field-with-no-name