问题
I am trying to retrieve a page which uses js and database to load. The loading takes about 2 to 3 mins. I am able to get the page where it would show "Please wait 2 to 3 mins for the page to be loaded." But not able to retrieve the page after it is loaded.
I have already tried the following:
1.) Using mirror method in the Mechanize. But the response content is not decoded. Hence the file is gibberish. (Also tried to write a similar method as mirror method which would decode the response content but that also doesnt work. The New content is not loaded.)
2.) Tried to add a request header 'if-modified-since'. But still the time is same and the new content is not fetched.
Any pointers or suggestions would really be helpful.
TIA :)
回答1:
It wont work with Mechanize itself, you need to check first what javascript is doing to the page, and from where the data are coming from. Then, 2 possibilities :
- You mimic the javascript in perl after you get the data before load, and from where javascript is downloading the new data. See if the data are somewhat encoded, and decode it with perl.
- You use Mech Firefox, then you do not need to care about javascript as it will be handled by Firefox. You can hide the application if you do not want to see it.
Example :
use WWW::Mechanize::Firefox;
use HTML::TreeBuilder::LibXML;
my $mech = WWW::Mechanize::Firefox->new;
$mech->get('http://example.com/ajax.html');
my $tree = HTML::TreeBuilder::LibXML->new;
$tree->parse($mech->content);
$tree->eof;
my $something = $tree->findvalue('/html/body/div[10]/table');
Above code is not tested, but should work.
Enjoy.
来源:https://stackoverflow.com/questions/25129159/perl-mechanize-get-the-response-page-after-the-page-is-modified