Style unstyled links with DOM and xpath

前端 未结 3 641
长发绾君心
长发绾君心 2020-12-21 16:46

For a system I am building I am defining a general style stored in LINKSTYLE that should be applied to a elements that are not yet sty

相关标签:
3条回答
  • 2020-12-21 17:02

    Use libxml_use_internal_errors(true) to suppress parsing errors stemming from loadHTML.

    • libxml_use_internal_errors() — Disable libxml errors and allow user to fetch error information

    The XPath query is invalid because contains expects a value to search for in the style attribute.

    • fn:contains($arg1 as xs:string?, $arg2 as xs:string?) as xs:boolean

    If you want to find all anchors without a style element, just use

    //a[not(@style)]
    

    You are not seeing your changes, because you are returning the string stored in $html. Once you loaded the string with DOMDocument, you have to serialize it back after you have have run your query and modified the DOMDocument's internal representation of that string.

    Example (demo)

    $html = <<< HTML
    <ul>
        <li><a href="#foo" style="font-weight:bold">foo</a></li>
        <li><a href="#bar">bar</a></li>
        <li><a href="#baz">baz</a></li>
    </ul>
    HTML;
    $dom = new DOMDocument;
    $dom->loadHTML($html);
    $xp = new DOMXpath($dom);
    foreach ($xp->query('//a[not(@style)]') as $node) {
        $node->setAttribute('style', 'font-weight:bold');
    }
    echo $dom->saveHTML($dom->getElementsByTagName('ul')->item(0));
    

    Output:

    <ul>
    <li><a href="#foo" style="font-weight:bold">foo</a></li>
        <li><a href="#bar" style="font-weight:bold">bar</a></li>
        <li><a href="#baz" style="font-weight:bold">baz</a></li>
    </ul>
    

    Note that in order to use saveHTML with an argument, you need at least PHP 5.3.6.

    0 讨论(0)
  • 2020-12-21 17:08

    The first error (before editing) occurs when you use inside document a & for other purposes than creating a entity-reference (e.g. &quot;).

    Usually this happens in URLs when you delimit GET-parameters.

    You can ignore this errors using Gordon's suggestion or fix it(replace occurences of & by &amp;).

    0 讨论(0)
  • 2020-12-21 17:13

    I was wondering if it's possible to solve this more CCS-wise, e.g. with a selector. In CSS3 it's possible to only address those <a> tags that don't have the style attribute:

    a:not([style]) {border:1px solid #000;}
    

    So if your documents already have a stylesheet it could be easily added.

    If not, then a <style> must be added to the document. This can be done with DomDocument as well but I found it a bit complicated. However I got it to work for some little play:

    libxml_use_internal_errors(true);    
    
    $html  = '<a href="#">test</a>'.
             '<a href="#" style="border:1px solid #000;">test2</a>';
    
    $dom = new DOMDocument();
    $dom->loadHtml($html);
    $dom->normalizeDocument();
    
    // ensure that there is a head element, body will always be there
    // because of loadHtml();
    $head = $dom->getElementsByTagName('head');
    if (0 == $head->length) {
        $head = $dom->createElement('head');
        $body = $dom->getElementsByTagName('body')->item(0);
        $head = $body->parentNode->insertBefore($head, $body);
    } else {
        $head=$head->item(0);
    }
    
    // append style tag to head.
    $css = 'a:not([style]) {border:1px solid #000;}';
    $style = $dom->createElement('style');
    $style->nodeValue=$css;
    $head->appendChild($style);
    
    $dom->formatOutput = true;
    $output = $dom->saveHtml();
    
    echo $output;
    

    Example output:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html>
    <head><style>a:not([style]) {border:1px solid #000;}</style></head>
    <body>
    <a href="#">test</a><a href="#" style="border:1px solid #000;">test2</a>
    </body>
    </html>
    

    If the CSS clashes with other, higher selectors, this is not an easy solution. !important might help though.

    HTML Fragment

    And as far of getting the changed HTML fragment, this is some additional code that can work with gordons suggestion. Just the inner-html of the body tag, this time I played a bit with the SPL:

    // get html fragment
    $output = implode('', array_map(
      function($node) use ($dom) { return $dom->saveXml($node); },
      iterator_to_array($xpath->query('//body/*'), false)))
      ;
    

    A foreach is definitely more readable and memory friendly:

    // get html fragment
    $output = '';
    foreach($xpath->query('//body/*') as $node) 
      $output .= $dom->saveXml($node)
      ;
    
    0 讨论(0)
提交回复
热议问题