Find all <pre> tags in PHP (with attributes)

我怕爱的太早我们不能终老 提交于 2019-12-23 03:37:07

问题


I was following this question on how to retrieve all tags in PHP.

Specifically (under wordpress), I'd like to find all <pre> tags, with all the available information (attributes and text). However, it seems that I'm not that skilled in preg_match, so I'm turning to you.

My text does contain various <pre> tags, some with attributes, some with just text. My function is this:

function getPreTags($string) {
    $pattern = "/<pre\s?(.*)>(.*)<\/pre>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

I've reduced to a test with just one <pre> tag, but I get count(getPreTags(myHTMLbody)) = 0, and I don't know why. This is the test string:

<pre class="wp-code-highlight prettyprint prettyprinted" style=""><span class="com">Whatever &lt;</span> I've written &gt;&gt; here <span class="something">should be taken care of</span></pre>

Any hint?

Cheers!


回答1:


As ever, parsing HTML with regex is never going to cut it. There are so many things to take into account (tag-soup, spacing: <pre>==< pre >==<\n\t\sPrE\n\n>...), any regex will fail you at some point. That's why there are such things as parsers, readily available.

That said: I have no idea why the other answers go through the trouble of using an instance of DOMXPath, when you need all pre tags, including those without attributes.
I'd go for something more simple, like:

$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$preTags = $dom->getElementsByTagName('pre');
foreach($preTags as $pre)
{
    echo $pre->nodeValue, PHP_EOL;
    if ($pre->hasAttributes())
    {//if there are attributes
        foreach($pre->attributes as $attribute)
        {
            //do something with attribute
            echo 'Attribute: ', $attribute->name, ' = ', $attribute->value, PHP_EOL;
        }
    }
}

What methods and properties are available to you can be found easily on these pages:

  • Attributes: DOMAttr class docs
  • Nodes: DOMNode class docs
  • Document: DOMDocument class docs



回答2:


You should better use DOM parser for parsing out HTML. Consider this code:

$html = <<< EOF
<a href="http://example.com/foo.htm" class="curPage">Click link1</a> morestuff
<pre>A    B    C</pre>
<a href="http://notexample.com/foo/bar">notexample.com</a> morestuff
<pre id="pre1">X    Y    Z</pre>
<a href="http://example.com/foo.htm">Click link1</a>
<pre id="pre2">1    2    3</pre>
EOF;

// create a new DOM object
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);

// select all pre tags with attributes
$nodelist = $xpath->query("//pre[@*]");

// iterate through selected nodes and print them
for($i=0; $i < $nodelist->length; $i++) {
    $node = $nodelist->item($i);
    var_dump($node->nodeValue);
}

OUTPUT:

string(11) "X    Y    Z"
string(11) "1    2    3"



回答3:


If the data is XML-conform, you could maybe use a XPATH expression.

Just a very quick one:

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head>
    <title>Test</title>
  </head>
  <body>
    <pre>1</pre>
    <pre>2</pre>
    <pre>3</pre>
  </body>
</html>

And then a PHP like this:

<?php
        $xmldoc = new DOMDocument();
        $xmldoc->load('test.xml');

        $xpathvar = new Domxpath($xmldoc);

echo $xpathvar->evaluate('count(*//pre)');
?>

This should also work with html/xml snippets.



来源:https://stackoverflow.com/questions/19763868/find-all-pre-tags-in-php-with-attributes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!