Issue:
Cannot fully understand the Goutte web scraper.
Request:
Can someone please help me understand or provide code to help
The documentation you want to look at is the Symfony2 DomCrawler.
Goutte is a client build on top of Guzzle that returns Crawlers every time you request/submit something:
use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', 'http://www.symfony-project.org/');
With this crawler you can do stuff like get all the P tags inside the body:
$nodeValues = $crawler->filter('body > p')->each(function (Crawler $node, $i) {
return $node->text();
});
print_r($nodeValues);
Fill and submit forms:
$form = $crawler->selectButton('sign in')->form();
$crawler = $client->submit($form, array(
'username' => 'username',
'password' => 'xxxxxx'
));
A selectButton() method is available on the Crawler which returns another Crawler that matches a button (input[type=submit], input[type=image], or a button) with the given text. [1]
You click on links or set options, select check-boxes and more, see Form and Link support.
To get data from the crawler use the html
or text
methods
echo $crawler->html();
echo $crawler->text();