The site I\'m trying to spider is using the javascript:
request.open(\"POST\", url, true);
To pull in extra information over ajax that I need t
This was what I came up with:
req = mechanize.Request("https://www.site.com/path/" + url, " ")
req.add_header("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.7) Gecko/20100713 Firefox/3.6.7")
req.add_header("Referer", "https://www.site.com/path")
cj.add_cookie_header(req)
res = mechanize.urlopen(req)
Whats interesting is the " " in the call to mechanize.Request forces it into "POST" mode. Obviously the site didn't choke on a single space :)
It needed the cookies as well. I debugged the headers using:
hh = mechanize.HTTPHandler()
hsh = mechanize.HTTPSHandler()
hh.set_http_debuglevel(1)
hsh.set_http_debuglevel(1)
opener = mechanize.build_opener(hh, hsh)
logger = logging.getLogger()
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.NOTSET)
mechanize.install_opener(opener)
Against what Firebug was showing.