问题
I'm talking to what seems to be a broken HTTP daemon and I need to make a GET
request that includes a pipe |
character in the URL.
LWP::UserAgent escapes the pipe character before the request is sent.
For example, a URL passed in as:
https://hostname/url/doSomethingScript?ss=1234&activities=Lec1|01
is passed to the HTTP daemon as
https://hostname/url/doSomethingScript?ss=1234&activities=Lec1%7C01
This is correct, but doesn't work with this broken server.
How can I override or bypass the encoding that LWP and its friends are doing?
Note
I've seen and tried other answers here on StackOverflow addressing similar problems. The difference here seems to be that those answers are dealing with POST
requests where the formfield
parts of the URL can be passed as an array of key/value pairs or as a 'Content' => $content
parameter. Those approaches aren't working for me with an LWP request.
I've also tried constructing an HTTP::Request
object and passing that to LWP, and passing the full URL direct to LWP->get()
. No dice with either approach.
In response to Borodin's request, this is a sanitised version of the code I'm using
#!/usr/local/bin/perl -w
use HTTP::Cookies;
use LWP;
my $debug = 1;
# make a 'browser' object
my $browser = LWP::UserAgent->new();
# cookie handling...
$browser->cookie_jar(HTTP::Cookies->new(
'file' => '.cookie_jar.txt',
'autosave' => 1,
'ignore_discard' => 1,
));
# proxy, so we can watch...
if ($debug == 1) {
$browser->proxy(['http', 'ftp', 'https'], 'http://localhost:8080/');
}
# user agent string (pretend to be Firefox)
$agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.7.12) Gecko/20050919 Firefox/1.0.7';
# set the user agent
$browser->agent($agent);
# do some things here to log in to the web site, accept session cookies, etc.
# These are basic POSTs of filled forms. Works fine.
# [...]
my $baseURL = 'https://hostname/url/doSomethingScript?ss=1234&activities=VALUEA|VALUEB';
@values = ['Lec1', '01', 'Lec1', '02'];
while (1) {
if (scalar(@values) < 2) { last; }
my $vala = shift(@values);
my $valb = shift(@values);
my $url = $basEURL;
$url =~ s/VALUEA/$vala/g;
$url =~ s/VALUEB/$valb/g;
# simplified. Would usually check request for '200' response, etc...
$content = $browser->get($url)->content();
# do something here with the content
# [...]
# fails because the '|' character in the url is escaped after it's handed
# to LWP
}
# end
回答1:
As @bchgys mentions in his comment, this is (almost) answered in the linked thread. Here are two solutions:
The first and arguably cleanest one is to locally override the escape map in URI::Escape to not modify the pipe character:
use URI;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new();
my $res;
{
# Violate RFC 2396 by forcing broken query string
# local makes the override take effect only in the current code block
local $URI::Escape::escapes{'|'} = '|';
$res = $ua->get('http://server/script?q=a|b');
}
print $res->request->as_string, "\n";
Alternatively, you can simply undo the escaping by modifying the URI directly in the request after the request has been created:
use HTTP::Request;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => 'http://server/script?q=a|b');
# Violate RFC 2396 by forcing broken query string
${$req->uri} =~ s/%7C/|/;
my $res = $ua->request($req);
print $res->request->as_string, "\n";
The first solution is almost certainly preferable because it at least relies on the %URI::Escape::escapes
package variable which is exported and documented, so that's probably as close as you're gonna get to doing this with a supported API.
Note that in either case you are in violation of RFC 2396 but as mentioned you may have no choice when talking to a broken server that you have no control over.
来源:https://stackoverflow.com/questions/21977147/how-may-i-bypass-lwps-url-encoding-for-a-get-request