PHP regex: is there anything wrong with this code?

前端 未结 2 800
长发绾君心
长发绾君心 2020-12-20 07:16

preg_replace_callback(\'#<(code|pre)([^>]*)>(((?!#si\', \'self::replaceit\', $text);

?

I\'m trying to r

2条回答
  •  囚心锁ツ
    2020-12-20 07:50

    is there anything wrong with this code?

    Yes. You're trying to parse HTML with a regex. Tsk, tsk, tsk. Let's not summon Zalgo quite yet.

    You should be using the DOM.

    $doc = new DOMDocument();
    $doc->loadHTML($text);
    $code_tags = $doc->getElementsByTagName('code');
    $pre_tags = $doc->getElementsByTagName('pre');
    

    This will leave you with a set of Node lists, which you may process the contents of as you desire. If you're encountering < and friends in the textContent (or when re-serializing the contents using saveXML), and you need the actual tags, consider htmlspecialchars_decode.


    Getting the first and last element in $code_tags, which is a DOM Node List:

    $first_code_tag = $code_tags->item(0);
    $last_code_tag = $code_tags->item( $code_tags->length - 1 );
    

    While you can treat a node list like an array inside a foreach, it isn't directly indexable, thus the whole checking for the length property and the use of the item method. Be aware that when there's only one item in the list, the first and last node will be identical. Thankfully you can just check to see if $code_tags->length is greater than one before checking the last in addition to the first.

    I'm not sure this is going to help you. Based off your other questions, it sounds like you're using this methodology to work on BBCode, and that you've turned the square brackets into less-than and greater-than. This isn't a problem, mind you, but it might make life interesting.

    Try inspecting the output of:

    echo $doc->saveXML($first_code_tag);
    

    to see if it's giving you the content that you expect.

提交回复
热议问题