How to escape all invalid characters from DOM XPath Query?

我是研究僧i 提交于 2019-12-02 07:54:02

问题


I have the following function that finds values within a HTML DOM;

It works, but when i give parameter $value like: Levi's Baby Overall, it cracks, because it does not escape the , and ' chars

How to escape all invalid characters from DOM XPath Query?

private function extract($file,$url,$value) {
    $result = array();
    $i = 0;
    $dom = new DOMDocument();
    @$dom->loadHTMLFile($file);
    //use DOMXpath to navigate the html with the DOM
    $dom_xpath = new DOMXpath($dom);
    $elements = $dom_xpath->query("//*[text()[contains(., '" . $value . "')]]");
    if (!is_null($elements)) {
        foreach ($elements as $element) {
            $nodes = $element->childNodes;
            foreach ($nodes as $node) {
                if (($node->nodeValue != null) && ($node->nodeValue === $value)) {
                    $xpath = preg_replace("/\/text\(\)/", "", $node->getNodePath());
                    $result[$i]['url'] = $url;
                    $result[$i]['value'] = $node->nodeValue;
                    $result[$i]['xpath'] = $xpath;
                    $i++;
                }
            }
        }
    }
    return $result;
}

回答1:


One shouldn't substitute placeholders in an XPath expression with arbitrary, user-provided strings -- because of the risk of (malicious) XPath injection.

To deal safely with such unknown strings, the solution is to use a pre-compiled XPath expression and to pass the user-provided string as a variable to it. This also completely eliminates the need to deal with nested quotes in the code.




回答2:


PHP has no built-in function for escaping/quoting strings for XPath queries. furthermore, escaping strings for XPath is surprisingly difficult to do, here's more information on why: https://stackoverflow.com/a/1352556/1067003 , and here is a PHP port of his C# XPath quote function:

function xpath_quote(string $value):string{
    if(false===strpos($value,'"')){
        return '"'.$value.'"';
    }
    if(false===strpos($value,'\'')){
        return '\''.$value.'\'';
    }
    // if the value contains both single and double quotes, construct an
    // expression that concatenates all non-double-quote substrings with
    // the quotes, e.g.:
    //
    //    concat("'foo'", '"', "bar")
    $sb='concat(';
    $substrings=explode('"',$value);
    for($i=0;$i<count($substrings);++$i){
        $needComma=($i>0);
        if($substrings[$i]!==''){
            if($i>0){
                $sb.=', ';
            }
            $sb.='"'.$substrings[$i].'"';
            $needComma=true;
        }
        if($i < (count($substrings) -1)){
            if($needComma){
                $sb.=', ';
            }
            $sb.="'\"'";
        }
    }
    $sb.=')';
    return $sb;
}

example usage:

$elements = $dom_xpath->query("//*[contains(text()," . xpath_quote($value) . ")]");
  • notice how i did not add the quoting characters (") in the xpath itself, because the xpath_quote function does it for me (or the concat() equivalent if needed)


来源:https://stackoverflow.com/questions/13026833/how-to-escape-all-invalid-characters-from-dom-xpath-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!