PHP simplexml xpath search for value in an ELEMENT containing tab delimited text?

前端 未结 1 1180
梦如初夏
梦如初夏 2021-01-27 19:31

How to do a PHP simplexml xpath search for text value in a tab delimited ELEMENT and returning text from that same element at a different offset from where the search te

相关标签:
1条回答
  • 2021-01-27 20:23

    You are already quite far and you have well analyzed the data you need to deal with. Also how you say you want to parse the data looks very well for me. The only thing that probably can be a little improved is that you take care to not do too much at once.

    One way to do so is to divide the problem(s) into smaller ones. I will show you how that works putting code into multiple functions and methods. But lets start with a single function, this goes step-by-step, so you can try to follow the examples to build this up.

    One way to separate problems in PHP is to use functions. For example, write one function to search in the XML document, this makes the code look a better and more speaking:

    /**
     * search metadata element
     *
     *
     * @param SimpleXMLElement $xml
     * @param string           $resource metadata attribute
     * @param string           $lookup   metadata attribute
     * @param string           $value    search value
     *
     * @return SimpleXMLElement
     */
    function metadata_search(SimpleXMLElement $xml, $resource, $lookup, $value) {
    
        $xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
                ."/DATA[contains(., '{$find}')]";
    
        list($element)= $xml->xpath($xpath);
    
        return $element;
    }
    

    So now you can easily search the document, the parameters are named and documented. All that it is needed is to call the function and get the return value:

    $data = metadata_search($xml, 'Property', 'Area', 2);
    

    This might not be the perfect function, but it is an example already. Next to functions you can also create objects. Objects are functions that have their own context. That's why those functions are called methods then, they belong to the object. Like the xpath() method of the SimpleXMLElement.

    If you see the function above, the first parameter is the $xml object. On that the xpath method is then executed. In the end what this function really does is creating and running the xpath query based on the input variables.

    If we could bring that function directly into the $xml object, we would not need to pass that any longer as first parameter. That is the next step and it works by extending SimpleXMLElement. We just add one new method that does the search and the method is pretty much the same as above. We also extend from SimpleXMLElement which means we create a sub-type of it: That is all it has already plus that new method you add:

    class MetadataElement extends SimpleXMLElement
    {
        /**
         * @param string           $resource metadata attribute
         * @param string           $lookup   metadata attribute
         * @param string           $value    search value
         *
         * @return SimpleXMLElement
         */
        public function search($resource, $lookup, $value) {
            $xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
                ."/DATA[contains(., '{$value}')]";
    
            list($element)= $this->xpath($xpath);
    
            return $element;
        }
    }
    

    To get this to life, we need to provide the name of this class when loading the XML string. Then the search method can be called directly:

    $xml  = simplexml_load_string($xmlString, 'MetadataElement');
    $data = $xml->search('Property', 'Area', 2);
    

    Voila, the search is now with the SimpleXMLElement!

    But what to do with this $data? It's just an XML element and it still contains the tabs.

    Even more bad, the context is lost: To which metadata column does this belong to? That is your problem. So we need to solve this next - but how?

    Honestly, there are many ways to do that. One Idea I had was to create a table object out of the XML based on a metadata element:

    list($metadata) = $xml->xpath('//METADATA[1]');
    $csv = new CsvTable($metadata);
    echo $csv;
    

    Even with nice debug output:

    +---------+----------+-----+
    |LongValue|ShortValue|Value|
    +---------+----------+-----+
    |Salado   |Sal       |5    |
    +---------+----------+-----+
    |Academ   |Aca       |2    |
    +---------+----------+-----+
    |Rogers   |Rog       |1    |
    +---------+----------+-----+
    |Bartlett |Bar       |4    |
    +---------+----------+-----+
    

    But that is somehow a lot of work if you're probably not fluent with programming objects so building a whole table model on it's own is maybe a bit much.

    Therefore I had the idea: Why not continue to use the XML object you already use and change the XML in there a bit to have it in a better format for your purposes. From:

    <METADATA Resource="Property" Lookup="Area">
      <COLUMNS>   LongValue   ShortValue  Value   </COLUMNS>
      <DATA>  Salado  Sal 5   </DATA>
    

    To:

    <METADATA Resource="Property" Lookup="Area" transformed="1">
        <COLUMNS>   LongValue   ShortValue  Value   </COLUMNS>
        <DATA>
            <LongValue>Salado</LongValue><ShortValue>Sal</ShortValue><Value>5</Value>
        </DATA>
    

    This would allow to not only search per a specific column name but also to find the other values in the data element. If the search return the $data element:

    $xml  = simplexml_load_string($xmlString, 'MetadataElement');
    $data = $xml->search('Property', 'Area', 5);
    echo $data->Value;     # 5
    echo $data->LongValue; # Salado
    

    If we leave an additional attribute with the metadata-element we can convert these elements while we search. If some data is found and the element not yet converted, it will be converted.

    Because we all do this inside the search method, the code using the search method must not change much (if not even not at all - depends a bit on the detailed needs you have, I might not have fully grasped those, but I think you get the idea). So let's put this to work. Because we don't want to do this all at once, we create multiple new methods to:

    1. transform a metadata element
    2. search inside the original element (this code we have already, we just move it)

    Along the way we will also create methods we deem helpful, you will notice that this is also partly code that you have written already (like in search()), it is just placed now inside the $xml object - where it more naturally belongs.

    Then finally these new methods will be put together in the existing search() method.

    So first of all, we create a helper method to parse this tabbed line into an array. It's basically your code, you do not need the string cast in front of trim, that is the only difference. Because this function is only needed inside, we make it private:

    private function asExplodedString() {
        return explode("\t", trim($this));
    }
    

    By its name it is clear what it does. It gives back the tab-exploded array of itself. If you remember, we are inside $xml so now every xml-element has this method. If you do not full understand this yet, just go on, you can see how it works right below, we only add one more method as a helper:

    public function getParent() {
        list($parent) = $this->xpath('..') + array(0 => NULL);
        return $parent;
    }
    

    This function allows us to retrieve the parent element of an element. This is useful because if we find a data element we want to transform the metadata element which is the parent. And because this function is of general use, I have chosen to make it public. So it can be used also in outside code. It solves a common problem and therefore is not of that specific nature like the explode method.

    So now we want to transform a metadata element. It will take some more lines of code as these two helper methods above though, but thanks to those things will not be complicated.

    We just assume that the element this method is called on is the metadata element. We do not add checks here to keep the code small. As this is a private function again, we even do not need to check: If this method is invoked on the wrong element, the fault had been done inside the class itself - not from outside code. This is also a nice example why I use private methods here, it's much more specific.

    So what we do now with the metadata element is actually quite simple: We fetch the column element inside, explode the column names, and then we go over each data-element, explode the data as well, then empty the data-element only to add the column-named children to it. Finally we add an attribute to mark the element as transformed:

    private function transform() {
        $columns = $this->COLUMNS->asExplodedString();
    
        foreach ($this->DATA as $data) {
            $values  = $data->asExplodedString();
            $data[0] = ''; # set the string of the element (make <DATA></DATA> empty)
            foreach ($columns as $index => $name) {
                $data->addChild($name, $values[$index]);
            }
        }
    
        $this['transformed'] = 1;
    }
    

    Okay. Now what gives? Let's test this. To do that we modify the existing search function to return the transformed data element - by adding a single line of code:

    public function search($resource, $lookup, $value) {
        $xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']"
            . "/DATA[contains(., '{$value}')]";
    
        list($element) = $this->xpath($xpath);
    
        $element->getParent()->transform();
        ###################################
    
        return $element;
    }
    

    And then we output it as XML:

    $data = $xml->search('Property', 'Area', 2);
    echo $data->asXML();
    

    This now gives the following output (beautified, it's on a single line normally):

    <DATA>
      <LongValue>Academ</LongValue>
      <ShortValue>Aca</ShortValue>
      <Value>2</Value>
    </DATA>
    

    And let's also check that the new attribute is set and all other data-elements of that metadata-table/block are transformed as well:

    echo $data->getParent()->asXML();
    

    And the output (beautified) as well:

    <METADATA Resource="Property" Lookup="Area" transformed="1">
      <COLUMNS> LongValue   ShortValue  Value   </COLUMNS>
      <DATA>
        <LongValue>Salado</LongValue>
        <ShortValue>Sal</ShortValue>
        <Value>5</Value>
      </DATA>
      ...
    

    This shows that the code works as intended. This might already solve your issue. E.g. if you always search for a number and the other columns do not contain numbers and you only need to search one per metadata block. However likely not, therefore the search function needs to be changed to perform the correct search and transform internally.

    This time again we make use of the $this to put a method on the concrete XML element. Two new methhods: One to get a Metadata element based on it's attributes:

    private function getMetadata($resource, $lookup) {
        $xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']";
        list($metadata) = $this->xpath($xpath);
        return $metadata;
    }
    

    And one to search a specific column of a metadata element:

    private function searchColumn($column, $value) {
        return $this->xpath("DATA[{$column}[contains(., '{$value}')]]");
    }
    

    These two methods are then used in the main search method. It will be slightly changed by first looking up the metadata element by its attributes. Then it will be checked if the transformation is needed and then the search by the value column is done:

    public function search($resource, $lookup, $value)
    {
        $metadata = $this->getMetadata($resource, $lookup);
        if (!$metadata['transformed']) {
            $metadata->transform();
        }
    
        list($element) = $metadata->searchColumn('Value', $value);
    
        return $element;
    }
    

    And now the new way of searching is finally done. It now searches only in the right column and the transformation will be done on the fly:

    $xml = simplexml_load_string($xmlString, 'MetadataElement');
    $data = $xml->search('Property', 'Area', 2);
    echo $data->LongValue, "\n"; # Academ
    

    Now that looks nice and it looks as if it is totally easy to use! All the complexity went into MetadataElement. And how does it look like at a glance?

    /**
     * MetadataElement - Example for extending SimpleXMLElement
     *
     * @link http://stackoverflow.com/q/16281205/367456
     */
    class MetadataElement extends SimpleXMLElement
    {
        /**
         * @param string $resource metadata attribute
         * @param string $lookup   metadata attribute
         * @param string $value    search value
         *
         * @return SimpleXMLElement
         */
        public function search($resource, $lookup, $value)
        {
            $metadata = $this->getMetadata($resource, $lookup);
            if (!$metadata['transformed']) {
                $metadata->transform();
            }
    
            list($element) = $metadata->searchColumn('Value', $value);
    
            return $element;
        }
    
        private function getMetadata($resource, $lookup) {
            $xpath = "//METADATA[@Resource='{$resource}' and @Lookup='{$lookup}']";
            list($metadata) = $this->xpath($xpath);
            return $metadata;
        }
    
        private function searchColumn($column, $value) {
            return $this->xpath("DATA[{$column}[contains(., '{$value}')]]");
        }
    
        private function asExplodedString() {
            return explode("\t", trim($this));
        }
    
        public function getParent() {
            list($parent) = $this->xpath('..') + array(0 => NULL);
            return $parent;
        }
    
        private function transform() {
            $columns = $this->COLUMNS->asExplodedString();
    
            foreach ($this->DATA as $data) {
                $values  = $data->asExplodedString();
                $data[0] = ''; # set the string of the element (make <DATA></DATA> empty)
                foreach ($columns as $index => $name) {
                    $data->addChild($name, $values[$index]);
                }
            }
    
            $this['transformed'] = 1;
        }
    }
    

    Not too bad either. Many small methods that just have some little lines of code, that is (rel.) easy to follow!

    So I hope this gives some inspiration, I know this was a quite some text to read. Have fun!

    0 讨论(0)
提交回复
热议问题