问题
short: How to execute/simulate javascript redirection with python Mechanize?
location.href="http://www.site2.com/";
I've made a python script with mechanize module that looks for a link in a page and follows it.
The problem is on a particular site that when I do
br.follow_link("http://www.address1.com")
he redirects me to this simple page:
<script language="JavaScript">{
location.href="http://www.site2.com/";
self.focus();
}</script>
Now, if I do:
br = mechanize.Browser(factory=mechanize.RobustFactory())
... #other code
br.follow_link("http://www.address1.com")
for link in br.links():
br.follow_link(link)
print link
it doesn't prints anything, that means that there is no link in that page. But if I manually parse the page and I execute:
br.open("http://www.site2.com")
Site2 doesn't recognizes that I'm coming from "www.address1.com" and the script does not work as I would like!
Sorry if it's just a newbie question and thank you in advance!
p.s. I have br.set_handle_referer(True)
EDIT: more info: Inspecting that link with Fiddler2 it looks like:
GET http://www.site2.com/ HTTP/1.1 Host: www.site2.com Connection: keep-alive User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Referer: http://www.address1.com Accept-Encoding: gzip,deflate,sdch Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie: PHPSESSID=6e161axxxxxxxxxxx; user=myusername;
pass=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx; ip=79.xx.xx.xx;
agent=a220243a8b8f83de64c6204a5ef7b6eb; __utma=154746788.943755841.1348303404.1350232016.1350241320.43; __utmb=154746788.12.10.1350241320; __utmc=154999999; __utmz=154746788.134999998.99.6.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=%something%something%
so it seems to be a cookie problem?
回答1:
Mechanize can't deal with JavaScript, since it can't interpret it, try parsing your site manually and passing this link to, br.follow_link
.
回答2:
I solved it! in this way:
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
...
br.follow_link("www.address1.com")
refe= br.geturl()
req = urllib2.Request(url='www.site2.com')
req.add_header('Referer', refe)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj) )
f = opener.open(req)
htm = f.read()
print "\n\n", htm
回答3:
How about
br.open("http://alpha.com")
br.follow_link("http://beta.com")
If you use br_follow_link
hopefully that sets the HTTP referrer with the previous page. Whereas if you dobr.open
that's like opening a new window, it doesn't set the HTTP referrer header.
Edit. Ok it looks like .follow_link
doesn't take strings but takes a special mechanize.Link
object with a property .absolute_url
. You can fake that.
>>> class Fake:
... pass
...
>>> x = Fake()
>>> x.absolute_url="http://stackoverflow.com"
>>> br.follow_link(x)
<response_seek_wrapper at 0x2937af8 whose wrapped object = <closeable_response at 0x2937f08 whose fp = <socket._fileobject object at 0x02934970>>>
>>> br.title()
'Stack Overflow'
or make a real mechanize.Link
which is less hacky but more tedious.
回答4:
You could set the HTTP referrer header explicitly before making your request
br.addheaders = [('Referer', 'http://alpha.com')]
br.open("http://beta.com")
More details in the surprisingly difficult to find official docs http://wwwsearch.sourceforge.net/mechanize/doc.html
来源:https://stackoverflow.com/questions/12881423/mechanize-python-how-to-follow-a-link-in-a-simple-javascript