Why can't I fetch wikipedia pages with LWP::Simple?

前端 未结 5 794
我在风中等你
我在风中等你 2021-01-11 16:52

I\'m trying to fetch Wikipedia pages using LWP::Simple, but they\'re not coming back. This code:

#!/usr/bin/perl
use strict;
use LWP::Simple;

print get(\"ht         


        
相关标签:
5条回答
  • 2021-01-11 17:27

    You can also just set the UA on the LWP::Simple module - just import the $ua variable, and it'll allow you to modify the underlying UserAgent:

    use LWP::Simple qw/get $ua/;
    $ua->agent("WikiBot/0.1");
    print get("http://en.wikipedia.org/wiki/Stack_overflow");
    
    0 讨论(0)
  • 2021-01-11 17:30

    Apparently Wikipedia blocks LWP::Simple requests: http://www.perlmonks.org/?node_id=695886

    The following works instead:

    #!/usr/bin/perl
    use strict;
    use LWP::UserAgent;
    
    my $url = "http://en.wikipedia.org/wiki/Stack_overflow";
    
    my $ua = LWP::UserAgent->new();
    my $res = $ua->get($url);
    
    print $res->content;
    
    0 讨论(0)
  • 2021-01-11 17:35

    Also see the Mediawiki related CPAN modules - these are designed to hit Mediawiki sites (of which wikipedia is one) and might give you more bells and whistles than simple LWP.

    http://cpan.uwinnipeg.ca/search?query=Mediawiki&mode=dist

    0 讨论(0)
  • 2021-01-11 17:39

    Because Wikipedia is blocking the HTTP user-agent string used by LWP::Simple.

    You will get a "403 Forbidden"-response if you try using it.

    Try the LWP::UserAgent module to work around this, setting the agent-attribute.

    0 讨论(0)
  • 2021-01-11 17:42

    I solved this problem using LWP:RobotUA instead of LWP::UserAgent. You can read the document below. There are not much differences you should modify.

    http://lwp.interglacial.com/ch12_02.htm

    0 讨论(0)
提交回复
热议问题