问题
I was following this question on how to retrieve all tags in PHP.
Specifically (under wordpress), I'd like to find all <pre>
tags, with all the available information (attributes and text). However, it seems that I'm not that skilled in preg_match, so I'm turning to you.
My text does contain various <pre>
tags, some with attributes, some with just text. My function is this:
function getPreTags($string) {
$pattern = "/<pre\s?(.*)>(.*)<\/pre>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
I've reduced to a test with just one <pre>
tag, but I get count(getPreTags(myHTMLbody)) = 0
, and I don't know why. This is the test string:
<pre class="wp-code-highlight prettyprint prettyprinted" style=""><span class="com">Whatever <</span> I've written >> here <span class="something">should be taken care of</span></pre>
Any hint?
Cheers!
回答1:
As ever, parsing HTML with regex is never going to cut it. There are so many things to take into account (tag-soup, spacing: <pre>
==< pre >
==<\n\t\sPrE\n\n>
...), any regex will fail you at some point. That's why there are such things as parsers, readily available.
That said: I have no idea why the other answers go through the trouble of using an instance of DOMXPath
, when you need all pre
tags, including those without attributes.
I'd go for something more simple, like:
$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$preTags = $dom->getElementsByTagName('pre');
foreach($preTags as $pre)
{
echo $pre->nodeValue, PHP_EOL;
if ($pre->hasAttributes())
{//if there are attributes
foreach($pre->attributes as $attribute)
{
//do something with attribute
echo 'Attribute: ', $attribute->name, ' = ', $attribute->value, PHP_EOL;
}
}
}
What methods and properties are available to you can be found easily on these pages:
- Attributes: DOMAttr class docs
- Nodes: DOMNode class docs
- Document: DOMDocument class docs
回答2:
You should better use DOM parser for parsing out HTML. Consider this code:
$html = <<< EOF
<a href="http://example.com/foo.htm" class="curPage">Click link1</a> morestuff
<pre>A B C</pre>
<a href="http://notexample.com/foo/bar">notexample.com</a> morestuff
<pre id="pre1">X Y Z</pre>
<a href="http://example.com/foo.htm">Click link1</a>
<pre id="pre2">1 2 3</pre>
EOF;
// create a new DOM object
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
// select all pre tags with attributes
$nodelist = $xpath->query("//pre[@*]");
// iterate through selected nodes and print them
for($i=0; $i < $nodelist->length; $i++) {
$node = $nodelist->item($i);
var_dump($node->nodeValue);
}
OUTPUT:
string(11) "X Y Z"
string(11) "1 2 3"
回答3:
If the data is XML-conform, you could maybe use a XPATH expression.
Just a very quick one:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head>
<title>Test</title>
</head>
<body>
<pre>1</pre>
<pre>2</pre>
<pre>3</pre>
</body>
</html>
And then a PHP like this:
<?php
$xmldoc = new DOMDocument();
$xmldoc->load('test.xml');
$xpathvar = new Domxpath($xmldoc);
echo $xpathvar->evaluate('count(*//pre)');
?>
This should also work with html/xml snippets.
来源:https://stackoverflow.com/questions/19763868/find-all-pre-tags-in-php-with-attributes