Convert spaces between PRE tags, via DOM parser

前端 未结 2 701
小蘑菇
小蘑菇 2021-01-15 22:15

Regex was my original idea as a solution, although it soon became apparent a DOM parser would be more appropriate... I\'d like to convert spaces to   b

2条回答
  •  时光说笑
    2021-01-15 22:43

    I see the short coming of my previous answer. Here is a workaround to preserve tags inside the

     tag:

    loadHTML($test);
    $xpath = new DOMXpath($dom);
    $pre = $xpath->query('//pre//text()');
    // manipulate nodes of type XML_TEXT_NODE
    foreach($pre as $e) {
        $e->nodeValue = str_replace(' ', '__REPLACEMELATER__', $e->nodeValue);
        // when you attempt to write   in a dom node
        // the & will be converted to & :(
    }
    $temp = $dom->saveHTML();
    $temp = str_replace('', '', $temp);
    $temp = str_replace('', '', $temp);
    $temp = str_replace('', '', $temp);
    $temp = str_replace('', '', $temp);
    $temp = str_replace('', '', $temp);
    $temp = str_replace('__REPLACEMELATER__', ' ', $temp);
    echo $temp;
    ?>
    

    Input

    paragraph 1 remains untouched

    preformatted 1
    preformatted 2
    preformatted 3 span text preformatted 3
    preformatted 4 span bold test text preformatted 3

    Output

    paragraph 1 remains untouched

    preformatted 1
    preformatted 2
    preformatted 3 span text preformatted 3
    preformatted 4 span bold test text preformatted 3

    Note #1

    DOMDocument::saveHTML() method in PHP >= 5.3.6 allows you to specify the node to output. Otherwise you can use str_replace() or preg_replace() to elimitate doctype, html and body tags.

    Note #2

    This trick seems to work and results in one less line of code but I am not sure if it is guaranteed to work:

    $e->nodeValue = utf8_encode(str_replace(' ', "\xA0", $e->nodeValue));
    // dom library will attempt to convert 0xA0 to  
    // nodeValue expects utf-8 encoded data but 0xA0 is not valid in this encoding
    // hence replaced string must be utf-8 encoded
    

提交回复
热议问题