How to traverse SimpleXML to edit text nodes?

匆匆过客 提交于 2019-12-20 06:19:28

问题


I need to implement the following algorithm with SimpleXML:

  1. put a XML fragment string into a SimpleXML object;
  2. traverse all the nodes, selecting text nodes;
  3. edit the text node (example convert to upper case);
  4. return the xml as string.

PROBLEMS:

  • How to load a XML with named entities (ex.  ).

  • To traverse XML to get only text nodes... With $sx->xpath('//text()'); I can not edit the nodes, how to select text nodes to edition?


回答1:


You can override the text content of a node returned by a SimpleXML XPath query by assigning to $node[0], e.g.

foreach ( $sx->xpath('//text()') as $text_node )
{
    $text_node[0] = 'Hello';
}

However, beware that SimpleXML does not really have a representation of a text node per se, so this kind of loop will behave oddly if there are both child elements and text within an element.

For instance given the XML <a><b>foo<c />bar</b><b>baz quux</b></a>, the two text nodes containing foo and bar will both be represented in SimpleXML by the first <b> element, the entire contents of which will be replaced by 'Hello', twice over, as shown in the below (live demo here). Using a counter variable in the substituted text, we can see clearly what's happening - the desired output would be <a><b>Hello 1<c />Hello 2</b><b>Hello 3</b></a>, but the actual result is <a><b>Hello 2</b><b>Hello 3</b></a>.

$sx = simplexml_load_string('<a><b>foo<c />bar</b><b>baz quux</b></a>');

$counter = 1;
foreach ( $sx->xpath('//text()') as $text_node )
{
     $text_node[0] = 'Hello ' . $counter++;
}

echo $sx->asXML();

This kind of manipulation, at least as you've framed the problem (finding text nodes, rather than iterating, possibly recursively, over a particular set of elements), is much more suited to the DOM API rather than SimpleXML. Bear in mind that there is no performance difference between the two (they are both wrappers around the same XML parser), and that you can combine operations using the two APIs on the same document by using simplexml_import_dom() and dom_import_simplexml(), again without additional overhead as the document doesn't need to be re-parsed.

Here is the above example fixed by using a mixture of SimpleXML and DOM (live demo). If this were the whole code, you could just parse with DOM directly, but this demonstrates how easy they are to mix if you already have other code manipulating this document with SimpleXML. Note that at the end, we output the XML using the original SimpleXML object - we don't need to run simplexml_import_dom($dom), because both objects refer to the same parsed "document" in memory.

$sx = simplexml_load_string('<a><b>foo<c />bar</b><b>baz quux</b></a>');
$dom = dom_import_simplexml($sx);

$counter = 1;
$xpath = new DOMXpath($dom->ownerDocument);
foreach ( $xpath->query('//text()') as $text_node )
{
     $text_node->nodeValue = 'Hello ' . $counter++;
}

echo $sx->asXML();


来源:https://stackoverflow.com/questions/17618280/how-to-traverse-simplexml-to-edit-text-nodes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!