Why is this xmlreader code not working?

后端 未结 3 1146
一个人的身影
一个人的身影 2021-01-07 05:05

I have a file that looks like this:

    
       About.com: Animation Guide

        
相关标签:
3条回答
  • 2021-01-07 05:28

    Here's an alternate way to get to that attribute:

    $string = file_get_contents($filename);
    $xml = new SimpleXMLElement($string);
    $result = $xml->xpath('/RDF/ExternalPage[*]/@about');
    var_dump($result);
    
    0 讨论(0)
  • 2021-01-07 05:30

    It'll take time and proper debugging to come up with working pure XMLReader code. Meanwhile try this hybrid method:

    $xmlR = new XMLReader;
    $xmlR->open('dbpedia/links/xml.xml');
    
    //Skip until <ExternalPage> node
    while ($xmlR->read() && $xmlR->name !== 'ExternalPage');
    
    $loadedNS_f = false;
    while ($xmlR->name === 'ExternalPage')
    {
        //Read the entire parent tag with children
        $sxmlNode = new SimpleXMLElement($xmlR->readOuterXML());
    
        //collect all namespaces in node recursively once; assuming all nodes are similar
        if (!$loadedNS_f) {
            $tagNS = $sxmlNode->getNamespaces(true);
            $loadedNS_f = true; 
        }
        $URL = (string) $sxmlNode['about'];
        $dNS = $sxmlNode->children($tagNS['d']);
        $Title = (string) $dNS->Title;
        $Desc = (string) $dNS->Description;
        $Topic = (string)$sxmlNode->topic;
    
        var_dump($URL, $Title, $Desc, $Topic);
    
        // Jump to next <ExternalPage> tag
        $xmlR->next('ExternalPage');
    }
    
    $xmlR->close();
    
    0 讨论(0)
  • 2021-01-07 05:40

    The reason why it is not working for you is because you only read to the start-tag of the d:Title element and that one got no value:

    if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'd:Title') {
        $title=$xmlReader->value;
    }
    

    You probably wanted to get the nodeValue of that DOM element, but that is not what $xmlReader->value will return. Knowing this there are multiple ways to deal with that:

    1. Expand the node (XMLReader::expand()) and get the nodeValue (quick example):

      $title = $reader->expand()->nodeValue;
      
    2. Process all XMLReader::TEXT (3) and/or XMLReader::CDATA (4) child-nodes your own (decide if a node is a child-node by looking into XMLReader::$depth).

    In any case to streamline your code you can consider to provide what you need directly, for example by creating yourself a set of functions your own or extend the XMLReader class:

    class MyXMLReader extends XMLReader
    {
        public function readToNextElement()
        {
            while (
                $result = $this->read()
                and $this->nodeType !== self::ELEMENT
            ) ;
            return $result;
        }
    
        public function readToNext($localname)
        {
            while (
                $result = $this->readToNextElement()
                and $this->localName !== $localname
            ) ;
            return $result;
        }
    
        public function readToNextChildElement($depth)
        {
            // if the current element is the parent and
            // empty there are no children to go into
            if ($this->depth == $depth && $this->isEmptyElement) {
                return false;
            }
    
            while ($result = $this->read()) {
                if ($this->depth <= $depth) return false;
                if ($this->nodeType === self::ELEMENT) break;
            }
    
            return $result;
        }
    
        public function getNodeValue($default = NULL)
        {
            $node = $this->expand();
            return $node ? $node->nodeValue : $default;
        }
    }
    

    You can then just use this extended class to do your processing:

    $reader = new MyXMLReader();
    $reader->open($uri);
    
    $num = 0;
    while ($reader->readToNext('ExternalPage') and $num < 200) {
        $url = $reader->getAttribute('about');
    
        $depth = $reader->depth;
        $title = $desc = '';
    
        while ($reader->readToNextChildElement($depth)) {
            switch ($reader->localName) {
                case 'Title':
                    $title = $reader->getNodeValue();
                    break;
                case 'Description':
                    $desc = trim($reader->getNodeValue());
                    break;
            }
        }
    
        $num++;
        echo "#", $num, ": ", $url, " - ", $title, " - ", $desc, "<br />\n";
    }
    

    As you can see, this has dramatically made your code much more readable. Also you do not need to care each time if you read this all right.

    0 讨论(0)
提交回复
热议问题