Why is this xmlreader code not working?

后端未结

关注

 3  1146

I have a file that looks like this:

    
       About.com: Animation Guide


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  北荒        
                
              
                            
                2021-01-07 05:28
              
            
            
                                                                       
Here's an alternate way to get to that attribute:

$string = file_get_contents($filename);
$xml = new SimpleXMLElement($string);
$result = $xml->xpath('/RDF/ExternalPage[*]/@about');
var_dump($result);

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  無奈伤痛        
                
              
                            
                2021-01-07 05:30
              
            
            
                                                                       
It'll take time and proper debugging to come up with working pure XMLReader code. Meanwhile try this hybrid method:

$xmlR = new XMLReader;
$xmlR->open('dbpedia/links/xml.xml');

//Skip until <ExternalPage> node
while ($xmlR->read() && $xmlR->name !== 'ExternalPage');

$loadedNS_f = false;
while ($xmlR->name === 'ExternalPage')
{
    //Read the entire parent tag with children
    $sxmlNode = new SimpleXMLElement($xmlR->readOuterXML());

    //collect all namespaces in node recursively once; assuming all nodes are similar
    if (!$loadedNS_f) {
        $tagNS = $sxmlNode->getNamespaces(true);
        $loadedNS_f = true; 
    }
    $URL = (string) $sxmlNode['about'];
    $dNS = $sxmlNode->children($tagNS['d']);
    $Title = (string) $dNS->Title;
    $Desc = (string) $dNS->Description;
    $Topic = (string)$sxmlNode->topic;

    var_dump($URL, $Title, $Desc, $Topic);

    // Jump to next <ExternalPage> tag
    $xmlR->next('ExternalPage');
}

$xmlR->close();

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤城傲影        
                
              
                            
                2021-01-07 05:40
              
            
            
                                                                       
The reason why it is not working for you is because you only read to the start-tag of the d:Title element and that one got no value:

if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'd:Title') {
    $title=$xmlReader->value;
}


You probably wanted to get the nodeValue of that DOM element, but that is not what $xmlReader->value will return. Knowing this there are multiple ways to deal with that:


Expand the node (XMLReader::expand()) and get the nodeValue (quick example):

$title = $reader->expand()->nodeValue;

Process all XMLReader::TEXT (3)  and/or XMLReader::CDATA (4) child-nodes your own (decide if a node is a child-node by looking into XMLReader::$depth).


In any case to streamline your code you can consider to provide what you need directly, for example by creating yourself a set of functions your own or extend the XMLReader class:

class MyXMLReader extends XMLReader
{
    public function readToNextElement()
    {
        while (
            $result = $this->read()
            and $this->nodeType !== self::ELEMENT
        ) ;
        return $result;
    }

    public function readToNext($localname)
    {
        while (
            $result = $this->readToNextElement()
            and $this->localName !== $localname
        ) ;
        return $result;
    }

    public function readToNextChildElement($depth)
    {
        // if the current element is the parent and
        // empty there are no children to go into
        if ($this->depth == $depth && $this->isEmptyElement) {
            return false;
        }

        while ($result = $this->read()) {
            if ($this->depth <= $depth) return false;
            if ($this->nodeType === self::ELEMENT) break;
        }

        return $result;
    }

    public function getNodeValue($default = NULL)
    {
        $node = $this->expand();
        return $node ? $node->nodeValue : $default;
    }
}


You can then just use this extended class to do your processing:

$reader = new MyXMLReader();
$reader->open($uri);

$num = 0;
while ($reader->readToNext('ExternalPage') and $num < 200) {
    $url = $reader->getAttribute('about');

    $depth = $reader->depth;
    $title = $desc = '';

    while ($reader->readToNextChildElement($depth)) {
        switch ($reader->localName) {
            case 'Title':
                $title = $reader->getNodeValue();
                break;
            case 'Description':
                $desc = trim($reader->getNodeValue());
                break;
        }
    }

    $num++;
    echo "#", $num, ": ", $url, " - ", $title, " - ", $desc, "<br />\n";
}


As you can see, this has dramatically made your code much more readable. Also you do not need to care each time if you read this all right.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复