For a system I am building I am defining a general style
stored in LINKSTYLE
that should be applied to a
elements that are not yet sty
Use libxml_use_internal_errors(true)
to suppress parsing errors stemming from loadHTML
.
The XPath query is invalid because contains
expects a value to search for in the style attribute.
If you want to find all anchors without a style element, just use
//a[not(@style)]
You are not seeing your changes, because you are returning the string stored in $html. Once you loaded the string with DOMDocument, you have to serialize it back after you have have run your query and modified the DOMDocument's internal representation of that string.
Example (demo)
$html = <<< HTML
<ul>
<li><a href="#foo" style="font-weight:bold">foo</a></li>
<li><a href="#bar">bar</a></li>
<li><a href="#baz">baz</a></li>
</ul>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html);
$xp = new DOMXpath($dom);
foreach ($xp->query('//a[not(@style)]') as $node) {
$node->setAttribute('style', 'font-weight:bold');
}
echo $dom->saveHTML($dom->getElementsByTagName('ul')->item(0));
Output:
<ul>
<li><a href="#foo" style="font-weight:bold">foo</a></li>
<li><a href="#bar" style="font-weight:bold">bar</a></li>
<li><a href="#baz" style="font-weight:bold">baz</a></li>
</ul>
Note that in order to use saveHTML with an argument, you need at least PHP 5.3.6.
The first error (before editing) occurs when you use inside document a &
for other purposes than creating a entity-reference (e.g. "
).
Usually this happens in URLs when you delimit GET-parameters.
You can ignore this errors using Gordon's suggestion or fix it(replace occurences of &
by &
).
I was wondering if it's possible to solve this more CCS-wise, e.g. with a selector. In CSS3 it's possible to only address those <a>
tags that don't have the style
attribute:
a:not([style]) {border:1px solid #000;}
So if your documents already have a stylesheet it could be easily added.
If not, then a <style>
must be added to the document. This can be done with DomDocument as well but I found it a bit complicated. However I got it to work for some little play:
libxml_use_internal_errors(true);
$html = '<a href="#">test</a>'.
'<a href="#" style="border:1px solid #000;">test2</a>';
$dom = new DOMDocument();
$dom->loadHtml($html);
$dom->normalizeDocument();
// ensure that there is a head element, body will always be there
// because of loadHtml();
$head = $dom->getElementsByTagName('head');
if (0 == $head->length) {
$head = $dom->createElement('head');
$body = $dom->getElementsByTagName('body')->item(0);
$head = $body->parentNode->insertBefore($head, $body);
} else {
$head=$head->item(0);
}
// append style tag to head.
$css = 'a:not([style]) {border:1px solid #000;}';
$style = $dom->createElement('style');
$style->nodeValue=$css;
$head->appendChild($style);
$dom->formatOutput = true;
$output = $dom->saveHtml();
echo $output;
Example output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><style>a:not([style]) {border:1px solid #000;}</style></head>
<body>
<a href="#">test</a><a href="#" style="border:1px solid #000;">test2</a>
</body>
</html>
If the CSS clashes with other, higher selectors, this is not an easy solution. !important
might help though.
And as far of getting the changed HTML fragment, this is some additional code that can work with gordons suggestion. Just the inner-html of the body tag, this time I played a bit with the SPL:
// get html fragment
$output = implode('', array_map(
function($node) use ($dom) { return $dom->saveXml($node); },
iterator_to_array($xpath->query('//body/*'), false)))
;
A foreach is definitely more readable and memory friendly:
// get html fragment
$output = '';
foreach($xpath->query('//body/*') as $node)
$output .= $dom->saveXml($node)
;