XML end element is read twice using XMLReader with PHP

扶醉桌前 提交于 2020-01-05 09:23:58

问题


I want to read a XML file, using XMLReader but the END ELEMENT is twice called for each element during parsing.

<publications>
  <article id="Xu86oazdn">
    <title>Learning</title>
    <authors>
      <author>
        <firstname>Michel</firstname>
        <lastname>Browsky</lastname>
      </author>
    </authors>
  </article>
</publications>

This is the piece of code which parse the author entries:

<?php
$xml = new XMLReader();
$xml->open("php://stdin");
$author = null;

while($xml->read()) {

  switch($xml->nodeType) {
    case XMLReader::ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("+" . $xml->name);
          break;
    }

    case XMLReader::END_ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("-" . $xml->name);
          break;
      }
    }
  }
?>

But strangely, the END_ELEMENT is called twice for each </author>, as shown by the echo messages:

+author
-author
-author

If I replace the echo message by a call to $xml->readOuterXML(), the first END_ELEMENT is the following:

<author>
  <firstname>Michel</firstname>
  <lastname>Browsky</lastname>
</author>

And the second one is the following:

<author/>

What is wrong with my code ? Did I use END_ELEMENT in a wrong way ? What is the right way to detect the end element ?


回答1:


Add a break statement after the end of the first switch condition on the nodeType:

<?php
$xml = new XMLReader();
$xml->open("php://stdin");

while($xml->read()) {

  switch($xml->nodeType) {
    case XMLReader::ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("+" . $xml->name);
          break;
    }

    // THIS LINE IS MISSING
    break;

    case XMLReader::END_ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("-" . $xml->name);
          break;
      }
    }
  }
?>

Add another break after reading the END_ELEMENT, as well, if only for symmetry.

    case XMLReader::END_ELEMENT:
      switch($xml->name) {
        case 'author':
          echo("-" . $xml->name);
          break;
      }
    }

    break;

The problem happened because of the coding style. Simplify the code. For example:

$xml = new XMLReader();
$xml->open("php://stdin");

while($xml->read()) {    
  switch($xml->nodeType) {
    case XMLReader::ELEMENT: {
      startElement( $xml->name );
      break;
    }

    case XMLReader::END_ELEMENT: {
      endElement( $xml->name );
      break;
    }
  }
}

There are further simplifications you can make. PHP has an XML marshalling package, but you could also abstract the code into classes. Instances of those classes would then be able to read (or write) themselves from (or to) an XML file. For example:

$xml = new XMLReader();
$xml->open("php://stdin");

while($xml->read()) {    
  if( $xml->name == 'author' ) {
    $author = new Author();
    $author->marshall( $xml );
  }
}

This couples the details of how the object is stored with the object itself. Any time you change the Author object, you know you must change how it marshalls itself. You could abstract and extend these concepts even further using appropriate design patterns, XML schemas, and so forth.

Thus your final code might resemble:

$xml = new XMLReader();
$xml->open( "php://stdin" );
$publications = new Publications();
$publications->marshall( $xml );

The Publications object is responsible for reading the XML document and instantiating the appropriate classes whenever their associated XML tags appear:

while($xml->read()) {    
  $article = new Article();
  $article->marshall( $xml );
  add( $article );
}

Use a PHP marshalling framework to save yourself time and effort. Consider XML_Serializer:

  • http://pear.php.net/package/XML_Serializer


来源:https://stackoverflow.com/questions/5060936/xml-end-element-is-read-twice-using-xmlreader-with-php

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!