Perl WWW::Mechanize (or LWP) get redirect url

蹲街弑〆低调 提交于 2019-11-28 00:33:09

问题


So I am using WWW::Mechanize to crawl sites. It works great, except if I request a url such as:

http://www.levi.com/

I am redirected to:

http://us.levi.com/home/index.jsp

And for my script I need to know that this redirect took place and what the url I was redirected to is. Is there anyway to detect this with WWW::Mechanize or LWP and then get the redirected url? Thanks!


回答1:


use strict;
use warnings;
use URI;
use WWW::Mechanize;

my $url = 'http://...';
my $mech = WWW::Mechanize->new(autocheck => 0);
$mech->max_redirect(0);
$mech->get($url);

my $status = $mech->status();
if (($status >= 300) && ($status < 400)) {
  my $location = $mech->response()->header('Location');
  if (defined $location) {
    print "Redirected to $location\n";
    $mech->get(URI->new_abs($location, $mech->base()));
  }
}

If the status code is 3XX, then you should check response headers for redirection url.




回答2:


You can also get to the same place by inspecting the redirects() method on the response object.

use strict;
use warnings;
use feature qw( say );

use WWW::Mechanize;

my $ua = WWW::Mechanize->new;
my $res = $ua->get('http://metacpan.org');

my @redirects = $res->redirects;
say 'request uri: ' . $redirects[-1]->request->uri;
say 'location header: ' . $redirects[-1]->header('Location');

Prints:

request uri: http://metacpan.org
location header: https://metacpan.org/

See https://metacpan.org/pod/HTTP::Response#$r-%3Eredirects Keep in mind that more than one redirect may have taken you to your current location. So you may want to inspect every response which is returned via redirects().



来源:https://stackoverflow.com/questions/10922054/perl-wwwmechanize-or-lwp-get-redirect-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!