问题
I was following this question on how to retrieve all tags in PHP.
Specifically (under wordpress), I'd like to find all <pre> tags, with all the available information (attributes and text). However, it seems that I'm not that skilled in preg_match, so I'm turning to you.
My text does contain various <pre> tags, some with attributes, some with just text. My function is this:
function getPreTags($string) {
$pattern = "/<pre\s?(.*)>(.*)<\/pre>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
I've reduced to a test with just one <pre> tag, but I get count(getPreTags(myHTMLbody)) = 0, and I don't know why. This is the test string:
<pre class="wp-code-highlight prettyprint prettyprinted" style=""><span class="com">Whatever <</span> I've written >> here <span class="something">should be taken care of</span></pre>
Any hint?
Cheers!
回答1:
As ever, parsing HTML with regex is never going to cut it. There are so many things to take into account (tag-soup, spacing: <pre>==< pre >==<\n\t\sPrE\n\n>...), any regex will fail you at some point. That's why there are such things as parsers, readily available.
That said: I have no idea why the other answers go through the trouble of using an instance of DOMXPath, when you need all pre tags, including those without attributes.
I'd go for something more simple, like:
$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$preTags = $dom->getElementsByTagName('pre');
foreach($preTags as $pre)
{
echo $pre->nodeValue, PHP_EOL;
if ($pre->hasAttributes())
{//if there are attributes
foreach($pre->attributes as $attribute)
{
//do something with attribute
echo 'Attribute: ', $attribute->name, ' = ', $attribute->value, PHP_EOL;
}
}
}
What methods and properties are available to you can be found easily on these pages:
- Attributes: DOMAttr class docs
- Nodes: DOMNode class docs
- Document: DOMDocument class docs
回答2:
You should better use DOM parser for parsing out HTML. Consider this code:
$html = <<< EOF
<a href="http://example.com/foo.htm" class="curPage">Click link1</a> morestuff
<pre>A B C</pre>
<a href="http://notexample.com/foo/bar">notexample.com</a> morestuff
<pre id="pre1">X Y Z</pre>
<a href="http://example.com/foo.htm">Click link1</a>
<pre id="pre2">1 2 3</pre>
EOF;
// create a new DOM object
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
// select all pre tags with attributes
$nodelist = $xpath->query("//pre[@*]");
// iterate through selected nodes and print them
for($i=0; $i < $nodelist->length; $i++) {
$node = $nodelist->item($i);
var_dump($node->nodeValue);
}
OUTPUT:
string(11) "X Y Z"
string(11) "1 2 3"
回答3:
If the data is XML-conform, you could maybe use a XPATH expression.
Just a very quick one:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head>
<title>Test</title>
</head>
<body>
<pre>1</pre>
<pre>2</pre>
<pre>3</pre>
</body>
</html>
And then a PHP like this:
<?php
$xmldoc = new DOMDocument();
$xmldoc->load('test.xml');
$xpathvar = new Domxpath($xmldoc);
echo $xpathvar->evaluate('count(*//pre)');
?>
This should also work with html/xml snippets.
来源:https://stackoverflow.com/questions/19763868/find-all-pre-tags-in-php-with-attributes