PHP XPath. Convert complex XML to array

后端 未结 1 1004
醉酒成梦
醉酒成梦 2020-12-22 02:32

After three days attempting to work it out by myself I have to give up. This is an excerpt of a huge XML exported from a dB of italian laws. I would like to convert this XML

相关标签:
1条回答
  • 2020-12-22 03:01

    Converting complex XML into an array will result ... in a complex array. You find the XML difficult to read and you think that arrays are much easier for you so the array must be the solution. But actually XML is well suited for tree parsing in PHP. An array isn't as accessible. For example, you can't run an xpath query on an array.

    And for the array, you have the mistake in your pseudo-array that you have duplicate keys. The structure looks more like:

    Array
    (
        [LeggeRegionale] => Array
            (
                [0] => Array
                    (
                        [intestazione] => Lex 12 2014, n. 26.
                        [articolato] => Array
                            (
                                [articolo] => Array
                                    (
                                        [0] => Array
                                            (
                                                [num] => Art. 41
                                                [rubrica] => (Riforma della finanza locale)
                                                [commi] => Array
                                                    (
                                                        [0] => Array
                                                            (
                                                                [num_alinea] => 1. Al fine di supportare...
                                                                [num_corpo] => Array
                                                                    (
                                                                        [0] => a) definizione di...
                                                                        [1] => b) coordinamento della...
                                                                        [2] => c) definizione delle...
                                                                        [3] => d) la disciplina...
                                                                    )
    
                                                            )
    
                                                        [1] => Array
                                                            (
                                                                [num_alinea] => 2. La revisione di...
                                                                [num_corpo] => Array
                                                                    (
                                                                        [0] => a) razionalizzazione e... articolo 1, comma 154, della legge 13 dicembre 2010, n. 220 (Legge di stabilità 2011);
                                                                        [1] => b) applicazione dei... articolo 119 della Costituzione , nonché del principio...
                                                                        [2] => c) valorizzazione...
                                                                        [3] => d) previsione di...
                                                                        [4] => e) valorizzazione del...
                                                                        [5] => f) previsione di...
                                                                    )
    

    ...

    And that is already with some optimization (you can ignore the wrong characters with the encoding, this is a copy & paste error).

    You perhaps want to make use of a library that makes it easier to query child-nodes via xpath directly. I've given an answer that show a small & quick example in use DOMXPath and function query php, you would need to add namespace support for that. You then would need to take care of the array building, which actually is quite complex:

    $doc = new DOMDocument();
    $doc->load('example.xml');
    
    /* DOMBLAZE II XMLNS */ $doc->registerNodeClass("DOMElement", "DOMBLAZE"); # ...
    
    /** @var $root DOMBLAZE */
    $root = $doc->documentElement;
    $root()->registerNamespace('a', 'http://www.normeinrete.it/nir/2.1/');
    
    $array = [];
    foreach ($root('a:LeggeRegionale') as $leggioRegionale) {
        $entry                 = [];
        $entry['intestazione'] = $leggioRegionale('string(./a:intestazione)');
        $articolato            = [];
        foreach ($leggioRegionale('a:articolato/a:articolo') as $articolo) {
        }
        $array[] = $entry;
    }
    
    print_r($array);
    

    This example is (obviously) incomplete.

    Alternatively I experimented writing down the xpath expressions in an XML itself which defines the "array". This then could be used with a tailored SimpleXMLElement to build the array recursively based on the definition:

    $doc = new DOMDocument();
    $doc->load('example.xml');
    
    $buffer = <<<XML
    <xmlarray>
        <xml>
            <namespace prefix="a" uri="http://www.normeinrete.it/nir/2.1/"/>
        </xml>
        <array>
            <LeggeRegionale expr="a:LeggeRegionale">
                <intestazione expr="string(a:intestazione)"/>
                <articolato expr="a:articolato">
                    <articolo expr="a:articolo">
                        <num expr="string(a:num)"/>
                        <rubrica expr="string(a:rubrica)"/>
                        <commi expr="a:comma">
                            <num_alinea expr="concat(a:num, a:alinea)"/>
                            <el expr="a:el" alias="num_corpo">
                                <num_corpo expr="normalize-space(concat(a:num, a:corpo))" cast="string"/>
                            </el>
                        </commi>
                    </articolo>
                </articolato>
            </LeggeRegionale>
        </array>
    </xmlarray>
    XML;
    
    $xmlArray = new XmlArrayElement($buffer);
    $xmlArray->assignDocument($doc);
    print_r($xmlArray->toArray());
    

    This does effectively produce the array presented in the beginning of the answer.

    Sure this now looks like the super solution to you, but all it did was wrapping the XML tree into another tree, this time in an array. The XmlArrayElement is not part of the example in the answer but in a gist.

    It's perhaps better to make use of the recursion to create another XML document on the fly.

    Also worth a consideration in your case would by XSLT which technically has been made for that. You could directly convert in a HTML document. HTML document are much more portable than RTF documents and there are existing tools to convert them into RTF but also other documents.

    0 讨论(0)
提交回复
热议问题