regex help with getting tag content in PHP

后端 未结 4 1312
有刺的猬
有刺的猬 2021-01-28 03:47

so I have the code

function getTagContent($string, $tagname) {

    $pattern = \"/<$tagname.*?>(.*)<\\/$tagname>/\";
    preg_match($pattern, $string         


        
相关标签:
4条回答
  • 2021-01-28 04:06

    try DOM

    $url  = "http://www.freakonomics.com/2008/09/24/wall-street-jokes-please/";
    $doc  = new DOMDocument();
    $dom  = $doc->loadHTMLFile($url);
    $items = $doc->getElementsByTagName('title');
    for ($i = 0; $i < $items->length; $i++)
    {
      echo $items->item($i)->nodeValue . "\n";
    }
    
    0 讨论(0)
  • 2021-01-28 04:08

    Probably because the title is spread on multiple lines. You need to add the option s so that the dot will also match any line returns.

    $pattern = "/<$tagname.*?>(.*)<\/$tagname>/s";
    
    0 讨论(0)
  • 2021-01-28 04:13

    The 'title' tag is not on the same line as its closing tag, so your preg_match doesn't find it.

    In Perl, you can add a /s switch to make it slurp the whole input as though on one line: I forget whether preg_match will let you do so or not.

    But this is just one of the reasons why parsing XML and variants with regexp is a bad idea.

    0 讨论(0)
  • 2021-01-28 04:22

    Have your php function getTagContent like this:

    function getTagContent($string, $tagname) {
        $pattern = '/<'.$tagname.'[^>]*>(.*?)<\/'.$tagname.'>/is';
        preg_match($pattern, $string, $matches);
        print_r($matches);
    }
    

    It is important to use non-greedy match all .*? for matching text between start and end of tag and equally important is to use flags s for DOTALL (matches new line as well) and i for ignore case comparison.

    0 讨论(0)
提交回复
热议问题