I have a file that looks like this:
About.com: Animation Guide
Here's an alternate way to get to that attribute:
$string = file_get_contents($filename);
$xml = new SimpleXMLElement($string);
$result = $xml->xpath('/RDF/ExternalPage[*]/@about');
var_dump($result);
It'll take time and proper debugging to come up with working pure XMLReader code. Meanwhile try this hybrid method:
$xmlR = new XMLReader;
$xmlR->open('dbpedia/links/xml.xml');
//Skip until <ExternalPage> node
while ($xmlR->read() && $xmlR->name !== 'ExternalPage');
$loadedNS_f = false;
while ($xmlR->name === 'ExternalPage')
{
//Read the entire parent tag with children
$sxmlNode = new SimpleXMLElement($xmlR->readOuterXML());
//collect all namespaces in node recursively once; assuming all nodes are similar
if (!$loadedNS_f) {
$tagNS = $sxmlNode->getNamespaces(true);
$loadedNS_f = true;
}
$URL = (string) $sxmlNode['about'];
$dNS = $sxmlNode->children($tagNS['d']);
$Title = (string) $dNS->Title;
$Desc = (string) $dNS->Description;
$Topic = (string)$sxmlNode->topic;
var_dump($URL, $Title, $Desc, $Topic);
// Jump to next <ExternalPage> tag
$xmlR->next('ExternalPage');
}
$xmlR->close();
The reason why it is not working for you is because you only read to the start-tag of the d:Title
element and that one got no value:
if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'd:Title') {
$title=$xmlReader->value;
}
You probably wanted to get the nodeValue of that DOM element, but that is not what $xmlReader->value
will return. Knowing this there are multiple ways to deal with that:
Expand the node (XMLReader::expand()) and get the nodeValue
(quick example):
$title = $reader->expand()->nodeValue;
Process all XMLReader::TEXT (3)
and/or XMLReader::CDATA (4)
child-nodes your own (decide if a node is a child-node by looking into XMLReader::$depth).
In any case to streamline your code you can consider to provide what you need directly, for example by creating yourself a set of functions your own or extend the XMLReader class:
class MyXMLReader extends XMLReader
{
public function readToNextElement()
{
while (
$result = $this->read()
and $this->nodeType !== self::ELEMENT
) ;
return $result;
}
public function readToNext($localname)
{
while (
$result = $this->readToNextElement()
and $this->localName !== $localname
) ;
return $result;
}
public function readToNextChildElement($depth)
{
// if the current element is the parent and
// empty there are no children to go into
if ($this->depth == $depth && $this->isEmptyElement) {
return false;
}
while ($result = $this->read()) {
if ($this->depth <= $depth) return false;
if ($this->nodeType === self::ELEMENT) break;
}
return $result;
}
public function getNodeValue($default = NULL)
{
$node = $this->expand();
return $node ? $node->nodeValue : $default;
}
}
You can then just use this extended class to do your processing:
$reader = new MyXMLReader();
$reader->open($uri);
$num = 0;
while ($reader->readToNext('ExternalPage') and $num < 200) {
$url = $reader->getAttribute('about');
$depth = $reader->depth;
$title = $desc = '';
while ($reader->readToNextChildElement($depth)) {
switch ($reader->localName) {
case 'Title':
$title = $reader->getNodeValue();
break;
case 'Description':
$desc = trim($reader->getNodeValue());
break;
}
}
$num++;
echo "#", $num, ": ", $url, " - ", $title, " - ", $desc, "<br />\n";
}
As you can see, this has dramatically made your code much more readable. Also you do not need to care each time if you read this all right.