I am trying to extract data from a webpage to insert it to a database. The data I\'m interested in is in the div\'s which have a class=\"company\". On one webpage there are
To check if a node exists, verify that the length property is equal to 1 in the returned query result:
if ($company_name->length == 1) {
$object->company_name = trim($company_name->item(0)->nodeValue);
}
Each Company can be represented by a context-node while having each property represented by an xpath-expression relative to it:
Company company-6666:
->id ....... = "company-6666" -- string(@id)
->name ..... = "Company Name" -- .//a[1]/text()
->href ..... = "/company-name" -- .//a[1]/@href
->img ...... = "/graphics/company/logo/listing/123456.jpg?_ts=1365390237" -- .//img[1]/@src
->address .. = "StreetName 500, 7777 City, County" -- .//*[@class="address"]/text()
...
If you wrap that into objects, this is pretty nifty to use:
$doc = new DOMDocument();
$doc->loadHTML($html);
/* @var $companies DOMValueObject[] */
$companies = new Companies($doc);
foreach ($companies as $company) {
printf("Company %s:\n", $company->id);
foreach ($company->getObjectProperties() as $name => $value) {
$expression = $company->getPropertyExpression($name);
printf(" ->%'.-10s = \"%s\" -- %s\n", $name.' ', $value, $expression);
}
}
This works with DOMObjectCollection and DOMValueObject, defining your own type:
class Companies extends DOMValueCollection
{
public function __construct(DOMDocument $doc) {
parent::__construct($doc, '//*[@class="company"]');
}
/**
* @return DOMValueObject
*/
public function current() {
$object = parent::current();
$object->defineProperty('id', 'string(@id)');
$object->defineProperty('name', './/a[1]/text()');
$object->defineProperty('href', './/a[1]/@href');
$object->defineProperty('img', './/img[1]/@src');
$object->defineProperty('address', './/*[@class="address"]/text()');
# ... add your definitions
return $object;
}
}
And for your array requirements there is a getArrayCopy()
method:
echo "\nGet Array Copy:\n\n";
print_r($companies->getArrayCopy());
Output:
Get Array Copy:
Array
(
[0] => Array
(
[id] => company-6666
[name] => Company Name
[href] => /company-name
[img] => /graphics/company/logo/listing/123456.jpg?_ts=1365390237
[address] => StreetName 500, 7777 City, County
)
)