问题
I am using the Spatie\Crawler
crawler software in a fairly standard way, like so:
$client = new Client([
RequestOptions::COOKIES => true,
RequestOptions::CONNECT_TIMEOUT => 10,
RequestOptions::TIMEOUT => 10,
RequestOptions::ALLOW_REDIRECTS => true,
]);
$crawler = new Crawler($client, 1);
$crawler->
setCrawlProfile(new MyCrawlProfile($startUrl, $pathRegex))->
setCrawlObserver(new MyCrawlObserver())->
startCrawling($url);
I've omitted the definition of the classes MyCrawlProfile
of MyCrawlObserver
for brevity, but anyway, this works as it stands.
I want to add some middleware in order to change some requests before they are made, so I added this demo code:
$stack = new HandlerStack();
$stack->setHandler(new CurlHandler());
$stack->push(
Middleware::mapRequest(function (RequestInterface $request) {
echo "Middleware running\n";
return $request;
})
);
$client = new Client([
RequestOptions::COOKIES => true,
RequestOptions::CONNECT_TIMEOUT => 10,
RequestOptions::TIMEOUT => 10,
RequestOptions::ALLOW_REDIRECTS => true,
'handler' => $stack,
]);
// ... rest of crawler code here ...
However, it falls on the first hurdle - it scrapes the root of the site (/
) which is actually a Location
redirect, and then stops. It turns out that I am now missing the RedirectMiddleware
despite not having removed it deliberately.
So, my problem is fixed by also adding this:
$stack->push(Middleware::redirect());
I wonder now what other things are set up by default in Guzzle that I have accidentally removed by creating a fresh HandlerStack
. Cookies? Retry mechanisms? Other stuff? I don't need those things right now, but I'd be a bit more confident about my system's long-term reliability if my code merely modified the existing stack.
Is there a way to do that? As far as I can tell, I'm doing things as per the manual.
回答1:
$stack = HandlerStack::create();
instead of
$stack = new HandlerStack();
$stack->setHandler(new CurlHandler());
It's important, because create()
adds additional middlewares, especially for redirects.
来源:https://stackoverflow.com/questions/43252730/can-i-add-middleware-to-the-default-guzzle-6-handlerstack-rather-than-creating