how to remove a tag and its contents using regular expression?

前端 未结 5 2564
囚心锁ツ
囚心锁ツ 2021-02-20 06:09

$str = \'some text tag contents more text \';

My questions are: How to retrieve content tag contents which is between

相关标签:
5条回答
  • 2021-02-20 06:28

    You do not want to use regular expressions for this. A much better solution would be to load your contents into a DOMDocument and work on it using the DOM tree and standard DOM methods:

    $document = new DOMDocument();
    $document->loadXML('<root/>');
    $document->documentElement->appendChild(
        $document->createFragment($myTextWithTags));
    
    $MY_TAGs = $document->getElementsByTagName('MY_TAG');
    foreach($MY_TAGs as $MY_TAG)
    {
        $xmlContent = $document->saveXML($MY_TAG);
        /* work on $xmlContent here */
    
        /* as a further example: */
        $ems = $MY_TAG->getElementsByTagName('em');
        foreach($ems as $em)
        {
            $emphazisedText = $em->nodeValue;
            /* do your operations here */
        }
    }
    
    0 讨论(0)
  • 2021-02-20 06:36

    For removal I ended up just using this:

    $str = preg_replace('~<MY_TAG(.*?)</MY_TAG>~Usi', "", $str);
    

    Using ~ instead of / for the delimiter solved errors being thrown because of the backslash in the end tag, which seemed to be an issue even with escaping. Eliminating > from the opening tag allows for attributes or other characters and still gets the tag and all of its contents.

    This only works where nesting is not a concern.

    The Usi modifiers mean U = Ungreedy, s = include linebreak characters, i = case insensitive.

    0 讨论(0)
  • 2021-02-20 06:44

    If MY_TAG can not be nested, try this to get the matches:

    preg_match_all('/<MY_TAG>(.*?)<\/MY_TAG>/s', $str, $matches)
    

    And to remove them, use preg_replace instead.

    0 讨论(0)
  • 2021-02-20 06:45

    Although the only fully correct way to do this is not to use regular expressions, you can get what you want if you accept it won't handle all special cases:

    preg_match("/<em[^>]*?>.*?</em>/i", $str, $match);
    // Use this only if you aren't worried about nested tags.
    // It will handle tags with attributes
    

    And

    preg_replace(""/<MY_TAG[^>]*?>.*?</MY_TAG>/i", "", $str);
    
    0 讨论(0)
  • 2021-02-20 06:50

    I tested this function, it works for nested tags too, use true/false to exclude/include your tags. Found here: https://www.php.net/manual/en/function.strip-tags.php

    <?php
    function strip_tags_content($text, $tags = '', $invert = FALSE) {
    
      preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
      $tags = array_unique($tags[1]);
       
      if(is_array($tags) AND count($tags) > 0) {
        if($invert == FALSE) {
          return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
        }
        else {
          return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
        }
      }
      elseif($invert == FALSE) {
        return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
      }
      return $text;
    }
    
    
    
    
    // Sample text:
    $text = '<b>sample</b> text with <div>tags</div>';
    
    // Result for:
    echo strip_tags_content($text);
    // text with
    
    // Result for:
    echo strip_tags_content($text, '<b>');
    // <b>sample</b> text with
    
    // Result for:
    echo strip_tags_content($text, '<b>', TRUE);
    // text with <div>tags</div>
    
    0 讨论(0)
提交回复
热议问题