问题
I want to read a XML file, using XMLReader but the END ELEMENT is twice called for each element during parsing.
<publications>
<article id="Xu86oazdn">
<title>Learning</title>
<authors>
<author>
<firstname>Michel</firstname>
<lastname>Browsky</lastname>
</author>
</authors>
</article>
</publications>
This is the piece of code which parse the author entries:
<?php
$xml = new XMLReader();
$xml->open("php://stdin");
$author = null;
while($xml->read()) {
switch($xml->nodeType) {
case XMLReader::ELEMENT:
switch($xml->name) {
case 'author':
echo("+" . $xml->name);
break;
}
case XMLReader::END_ELEMENT:
switch($xml->name) {
case 'author':
echo("-" . $xml->name);
break;
}
}
}
?>
But strangely, the END_ELEMENT is called twice for each </author>
, as shown by the echo messages:
+author
-author
-author
If I replace the echo message by a call to $xml->readOuterXML()
, the first END_ELEMENT is the following:
<author>
<firstname>Michel</firstname>
<lastname>Browsky</lastname>
</author>
And the second one is the following:
<author/>
What is wrong with my code ? Did I use END_ELEMENT in a wrong way ? What is the right way to detect the end element ?
回答1:
Add a break
statement after the end of the first switch
condition on the nodeType
:
<?php
$xml = new XMLReader();
$xml->open("php://stdin");
while($xml->read()) {
switch($xml->nodeType) {
case XMLReader::ELEMENT:
switch($xml->name) {
case 'author':
echo("+" . $xml->name);
break;
}
// THIS LINE IS MISSING
break;
case XMLReader::END_ELEMENT:
switch($xml->name) {
case 'author':
echo("-" . $xml->name);
break;
}
}
}
?>
Add another break
after reading the END_ELEMENT
, as well, if only for symmetry.
case XMLReader::END_ELEMENT:
switch($xml->name) {
case 'author':
echo("-" . $xml->name);
break;
}
}
break;
The problem happened because of the coding style. Simplify the code. For example:
$xml = new XMLReader();
$xml->open("php://stdin");
while($xml->read()) {
switch($xml->nodeType) {
case XMLReader::ELEMENT: {
startElement( $xml->name );
break;
}
case XMLReader::END_ELEMENT: {
endElement( $xml->name );
break;
}
}
}
There are further simplifications you can make. PHP has an XML marshalling package, but you could also abstract the code into classes. Instances of those classes would then be able to read (or write) themselves from (or to) an XML file. For example:
$xml = new XMLReader();
$xml->open("php://stdin");
while($xml->read()) {
if( $xml->name == 'author' ) {
$author = new Author();
$author->marshall( $xml );
}
}
This couples the details of how the object is stored with the object itself. Any time you change the Author
object, you know you must change how it marshalls itself. You could abstract and extend these concepts even further using appropriate design patterns, XML schemas, and so forth.
Thus your final code might resemble:
$xml = new XMLReader();
$xml->open( "php://stdin" );
$publications = new Publications();
$publications->marshall( $xml );
The Publications
object is responsible for reading the XML document and instantiating the appropriate classes whenever their associated XML tags appear:
while($xml->read()) {
$article = new Article();
$article->marshall( $xml );
add( $article );
}
Use a PHP marshalling framework to save yourself time and effort. Consider XML_Serializer:
- http://pear.php.net/package/XML_Serializer
来源:https://stackoverflow.com/questions/5060936/xml-end-element-is-read-twice-using-xmlreader-with-php