lwp | 易学教程

How to Parse a webpage

阅读更多关于 How to Parse a webpage

问题 I am attempting to extract the following from the EnviroCanada weather page. I am trying to get for each hour as per the following. Time | Thigh | Tlow | Humidity 7:00 | 23 | 22.9 | 30 Extracted HTML Page: <tr> <td headers="header1" class="text-center vertical-center"> 7:00 </td> <td headers="header2" class="media vertical-center"><span class="pull-left"><img class="media-object" height="35" width="35" src="/weathericons/small/02.png" /></span><div class="visible-xs visible-sm"> <br /> <br />

POST API in Perl using LWP::UserAgent with authentication

阅读更多关于 POST API in Perl using LWP::UserAgent with authentication

问题 I am trying to use POST method in perl to send information to an API. I would like to call the below api which requires following inputs: URI: https://www.cryptopia.co.nz/api/SubmitTrade Input Parameters are:- Market: The market symbol of the trade e.g. 'DOT/BTC' (not required if 'TradePairId' supplied) TradePairId: The Cryptopia tradepair identifier of trade e.g. '100' (not required if 'Market' supplied) Type: the type of trade e.g. 'Buy' or 'Sell' Rate: the rate or price to pay for the

Perl LWP GET or POST to an SNI SSL URL

阅读更多关于 Perl LWP GET or POST to an SNI SSL URL

问题 I have a system that sends data to customers using perl LWP. They can choose their URL and whether to POST or GET. A new customer recently complained that the service doesn't work and they suspect it's because their endpoint uses SNI SSL. Looking in the logs, all I see is the error message "(certificate verify failed) (500 read timeout)". Is there any way to tell if this issue is because of their SNI SSL, or something different? I think I can solve the problem by turning off verify_hostname,

Why can't LWP::UserAgent get this site entirely?

阅读更多关于 Why can't LWP::UserAgent get this site entirely?

问题 It outputs only a few lines from the beginning. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; my $ua = LWP::UserAgent->new; my $response = $ua->get('http://www.eurogamer.net/articles/df-hardware-wii-u-graphics-power-finally-revealed'); print $response->decoded_content; 回答1: I ran the following modification: my $response = $ua->get( 'http://www.eurogamer.net/articles/df-hardware-wii-u-graphics-power-finally-revealed' ); say $response->headers->as_string; And saw this: Cache

How may I bypass LWP's URL encoding for a GET request?

阅读更多关于 How may I bypass LWP's URL encoding for a GET request?

问题 I'm talking to what seems to be a broken HTTP daemon and I need to make a GET request that includes a pipe | character in the URL. LWP::UserAgent escapes the pipe character before the request is sent. For example, a URL passed in as: https://hostname/url/doSomethingScript?ss=1234&activities=Lec1|01 is passed to the HTTP daemon as https://hostname/url/doSomethingScript?ss=1234&activities=Lec1%7C01 This is correct, but doesn't work with this broken server. How can I override or bypass the

Cookies in perl lwp

阅读更多关于 Cookies in perl lwp

问题 I once wrote a simple 'crawler' to download http pages for me in JAVA. Now I'm trying to rewrite to same thing to Perl, using LWP module. This is my Java code (which works fine): String referer = "http://example.com"; String url = "http://example.com/something/cgi-bin/something.cgi"; String params= "a=0&b=1"; HttpState initialState = new HttpState(); HttpClient httpclient = new HttpClient(); httpclient.setState(initialState); httpclient.getParams().setCookiePolicy(CookiePolicy.NETSCAPE);

LWP::UserAgent Can't Post with TLS1.1

阅读更多关于 LWP::UserAgent Can't Post with TLS1.1

问题 Getting 500 handshaker error:443 over https. The host service I am sending XML to does not support TLS 1.2, they do support 1.0 and 1.1. Currently using LWP 6.03 on CentOS 6. Using the code below they claim I am still sending using TLS1.2 use LWP::UserAgent; $ua = LWP::UserAgent->new(ssl_opts => { verify_hostname => 0,SSL_version => 'SSLv23:!TLSv12' }); $req = HTTP::Request->new(GET => 'https://secure-host-server'); $res = $ua->request($req); if ($res->is_success) { print $res->content; }

How can I get the ultimate URL without fetching the pages using Perl and LWP?

阅读更多关于 How can I get the ultimate URL without fetching the pages using Perl and LWP?

问题 I'm doing some web scraping using Perl's LWP. I need to process a set of URLs, some of which may redirect (1 or more times). How can I get ultimate URL with all redirects resolved, using HEAD method? 回答1: If you use the fully featured version of LWP::UserAgent, then the response that is returned is an instance of HTTP::Response which in turn has as an attribute an HTTP::Request. Note that this is NOT necessarily the same HTTP::Request that you created with the original URL in your set of URLs

Scripts broke after upgrading LWP “certificate verify failed”

阅读更多关于 Scripts broke after upgrading LWP “certificate verify failed”

问题 I have a lot of scripts, most of them based around WWW::Mechanize that scrape data off of misc hardware that is accessible via HTTPs. After upgrading most of my perl installation and its modules, all scripts using HTTPS:// broke because of "certificate verify failed" This is a result of the fact that the newer versions of LWP does a proper check on the certificate and dies if something doesn't match. In my case, the failed certificate authentication is expected due to the circumstances, so i

How can I extract XML of a website and save in a file using Perl's LWP?

阅读更多关于 How can I extract XML of a website and save in a file using Perl's LWP?

问题 How can I extract information from a website (http://tv.yahoo.com/listings) and then create an XML file out of it? I want to save it so to parse later and display information using JavaScript? I am quite new to Perl and I have no idea about how to do it. 回答1: Of course. The easiest way would be the Web::Scraper module. What it does is it lets you define scraper objects that consist of hash key names, XPath expressions that locate elements of interest, and code to extract bits of data from