How to remove entire div with preg_replace

前端 未结 4 1362
心在旅途
心在旅途 2021-01-16 17:06

Ok, as it is WordPress problem and it sadly goes a little deeper, I need to remove each representation of parent div and its inside:

相关标签:
4条回答
  • 2021-01-16 17:27

    I wouldn't use a regular expression. Instead, I would use the DOMDocument class. Just find all of the div elements with that class, and remove them from their parent(s):

    $html = "<p>Hello World</p>
             <div class='sometestclass'>
               <img src='foo.png'/>
               <div>Bar</div>
             </div>";
    
    $dom = new DOMDocument;
    $dom->loadHTML( $html );
    
    $xpath = new DOMXPath( $dom );
    $pDivs = $xpath->query(".//div[@class='sometestclass']");
    
    foreach ( $pDivs as $div ) {
      $div->parentNode->removeChild( $div );
    }
    
    echo preg_replace( "/.*<body>(.*)<\/body>.*/s", "$1", $dom->saveHTML() );
    

    Which results in:

    <p>Hello World</p>
    
    0 讨论(0)
  • 2021-01-16 17:28

    How about just some CSS .sometestclass{display: none;} ?

    0 讨论(0)
  • 2021-01-16 17:28

    For the UTF-8 issue, I found a hack at the PHP-manual

    So my functions looks as follows:

    function rem_fi_cat() {
    /* This function removes images from _within_ the article.
     * If these images are enclosed in a "wp-caption" div-tag.
     * If the articles are post formatted as "image".
     * Only on home-page, front-page an in category/archive-pages.
     */
    if ( (is_home() || is_front_page() || is_category()) && has_post_format( 'image' ) ) {
        $document = new DOMDocument();
        $content = get_the_content( '', true );
        if( '' != $content ) {
            /* incl. UTF-8 "hack" as described at 
             * http://www.php.net/manual/en/domdocument.loadhtml.php#95251
             */
            $document->loadHTML( '<?xml encoding="UTF-8">' . $content );
            foreach ($doc->childNodes as $item) {
                if ($item->nodeType == XML_PI_NODE) {
                    $doc->removeChild($item); // remove hack
                    $doc->encoding = 'UTF-8'; // insert proper
                }
            }
            $xpath = new DOMXPath( $document );
            $pDivs = $xpath->query(".//div[@class='wp-caption']");
    
            foreach ( $pDivs as $div ) {
                $div->parentNode->removeChild( $div );
            }
    
            echo preg_replace( "/.*<div class=\"entry-container\">(.*)<\/div>.*/s", "$1", $document->saveHTML() );
    
        }
    }
    

    }

    0 讨论(0)
  • 2021-01-16 17:50
    <?php $content = preg_replace('/<div class="sometestclass">.*?<\/div><!-- END: .sometestclass -->/s','',$content); ?>
    

    My RegEx is a bit rusty, but I think this should work. Do note that, as others have said, RegEx is not properly equipped to handle some of the complexities of HTML.

    In addition, this pattern won't find embedded div elements with the class sometestclass. You would need recursion for that.

    0 讨论(0)
提交回复
热议问题