Why can't I fetch wikipedia pages with LWP::Simple?

邮差的信 提交于 2019-12-01 03:28:34

Apparently Wikipedia blocks LWP::Simple requests: http://www.perlmonks.org/?node_id=695886

The following works instead:

#!/usr/bin/perl
use strict;
use LWP::UserAgent;

my $url = "http://en.wikipedia.org/wiki/Stack_overflow";

my $ua = LWP::UserAgent->new();
my $res = $ua->get($url);

print $res->content;

You can also just set the UA on the LWP::Simple module - just import the $ua variable, and it'll allow you to modify the underlying UserAgent:

use LWP::Simple qw/get $ua/;
$ua->agent("WikiBot/0.1");
print get("http://en.wikipedia.org/wiki/Stack_overflow");

I solved this problem using LWP:RobotUA instead of LWP::UserAgent. You can read the document below. There are not much differences you should modify.

http://lwp.interglacial.com/ch12_02.htm

Because Wikipedia is blocking the HTTP user-agent string used by LWP::Simple.

You will get a "403 Forbidden"-response if you try using it.

Try the LWP::UserAgent module to work around this, setting the agent-attribute.

Also see the Mediawiki related CPAN modules - these are designed to hit Mediawiki sites (of which wikipedia is one) and might give you more bells and whistles than simple LWP.

http://cpan.uwinnipeg.ca/search?query=Mediawiki&mode=dist

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!