问题
Now I'm using iMacros to extract data from a web and fill forms submitting the data.
But iMacros is a expensive tool. I need a free library and I've read about Scrapy for data minning. I's a litle more complex to programming with it but the money rules.
The question is if I can fill html forms with Scrapy and submit to the web page. I don't want to use Javascript, I want to use exclusively Python scripts.
I searched in http://doc.scrapy.org/ but I didn't found nothing about form-submit.
回答1:
Use the scrapy.http.FormRequest
class.
The FormRequest class extends the base Request with functionality for dealing with HTML forms
http://doc.scrapy.org/en/latest/topics/request-response.html#formrequest-objects
回答2:
Mechanize is a python library that lets you automate interactions with a website. It supports HTML form filling.
回答3:
The below program explains you how to fill a form:
import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text
# Browser
br = mechanize.Brenter code hereowser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# The site we will navigate into, handling it's session
br.open('http://gmail.com')
# Select the first (index zero) form
br.select_form(nr=0)
# User credentials
br.form['Email'] = 'user'
br.form['Passwd'] = 'password'
# Login
br.submit()
# Filter all links to mail messages in the inbox
all_msg_links = [l for l in br.links(url_regex='\?v=c&th=')]
# Select the first 3 messages
for msg_link in all_msg_links[0:3]:
print msg_link
# Open each message
br.follow_link(msg_link)
html = br.response().read()
soup = BeautifulSoup(html)
# Filter html to only show the message content
msg = str(soup.findAll('div', attrs={'class': 'msg'})[0])
# Show raw message content
print msg
# Convert html to text, easier to read but can fail if you have intl
# chars
# print html2text.html2text(msg)
print
# Go back to the Inbox
br.follow_link(text='Inbox')
# Logout
br.follow_link(text='Sign out')
来源:https://stackoverflow.com/questions/21929473/can-i-fill-web-forms-with-scrapy