Text replacement: PHP/regex

[亡魂溺海] 提交于 2019-12-02 07:05:47

I was about to post an answer on your next question but Casimir closed it before I got the chance. I am coming back here to post a proper html parse-then-replace technique for the benefit of researchers and you.

Code: (Demo)

define('LOOKUP', [
    'block' => [
        'welcome-intro'         => 'custom intro'
    ],
    'variable' => [
        'contact-email-address' => 'mmu@mmu.com',
        'system_version'        => 'sys ver',
        'system_name'           => 'sys name',
        'system_login'          => 'sys login',
        'activate_url'          => 'some url'
    ],

]);

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);

foreach ($xpath->query("//*[not(self::textarea or self::select or self::input) and contains(., '{{{')]/text()") as $node) {
    $node->nodeValue = preg_replace_callback('~{{{([^:]+):([^}]+)}}}~', function($m) {
            return LOOKUP[$m[1]][$m[2]] ?? '**unknown variable**';
        },
        $node->nodeValue);
}
echo $dom->saveHTML();

Output:

<!DOCTYPE html>
<html lang="en"><head><meta charset="utf-8"><title>Test</title></head><body>
    <section id="about"><div class="container about-container">
            <div class="row">
                <div class="col-md-12">
                    custom intro
                </div>
            </div>
        </div>
    </section><section id="services"><div class="container">
            <div class="row">
                <div class="col-md-12">
                                        <p>You are using system version: sys ver</p>
                    <p>Your address: mmu@mmu.com</p>
                    <form action="http://k.loc/content/view/welcome" class="default-form" enctype="multipart/form-data" method="post" accept-charset="utf-8">
                                                                                    <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78"><div class="row">
                            <div class="col-sm-12 form-error"></div>
                        </div>
                    <div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testinput">Name<span class="form-validation-required"> * </span></label>

                    </div>
                <div class="hint-text">Enter at least 2 characters and a maximum of 12 characters.</div><input id="testinput" name="testinput" placeholder="Enter your name here." class="input-group width-50" type="text" value="{{{variable:system_name}}}  {{{variable:system_login}}}"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testpassword">Password</label>

                    </div>
                <div class="hint-text">Your password must be at least 12 characters long, contain 1 special character, 1 nunber, 1 lower case character and 1 upper case character.</div><input id="testpassword" name="testpassword" placeholder="Enter your password here." class="input-group width-50" type="password"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><fieldset id="bioinfo"><legend>Biographical information</legend><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testtextarea">Biography</label>
                <span class="hint-text">A minimum of 40 characters and a maximum of 255 is allowed. This hint is displayed inline.</span>
                    </div>
                <textarea id="testtextarea" name="testtextarea" placeholder="Please enter your biography here." class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}}

{{{variable:system_login}}}</textarea><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
                    <div class="control-label">
                        <label for="testsummernote">Interests</label>
                <span class="hint-text">A minimum of 40 characters is required. This hint is displayed inline.</span>
                    </div>
                <textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="Please enter your interests here."><p>sys name<br></p><p>sys login</p><p>some url<br></p></textarea></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><button name="testsubmit" id="testsubmit" type="submit" class="btn primary">Submit<i class="zmdi zmdi-arrow-forward"></i></button></div></div>
        </form>                </div>
            </div>
        </div>
    </section></body></html>

There aren't too many tricks involved.

  1. Parse the HTML with DOMDocument and write a filtering query with XPath which requires nodes to not be textarea|select|input tags and they must contain {{{ in their text. There will be several "magical" ways to filter the dom -- this is just one way that feels efficient/direct to me.

  2. I use preg_replace_callback() to perform replacements based on a lookup array.

  3. To avoid use() in the callback syntax, I make the lookup available inside the callback's scope by declaring it as a constant (I can't imagine you need it to be a variable anyhow).

  4. I found during testing that DOMDocument didn't like the <section> tags, so I silenced the complaints with libxml_use_internal_errors(true);.

My guess is that you are likely designing an expression similar to:

<(?:textarea|select)[\s\S]*?({{variable:system_version}})[\s\S]*?<\/(?:textarea|select)>|<(?:input)[\s\S]*?({{variable:system_version}})[\s\S]*?>

which you might probably want to modify it, and then replace with what you like to replace.

The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

Test

$re = '/<(?:textarea|select)[\s\S]*?({{variable:system_version}})[\s\S]*?<\/(?:textarea|select)>|<(?:input)[\s\S]*?({{variable:system_version}})[\s\S]*?>/m';
$str = '<html>
    <head>
        <title>System version: 6.0</title>
    </head>
    <body>
        <p>You are using system version 6.0</p>
        <div>
            This was the content of the welcome block.
        </div>
        <form>
            <input value="System version: {{variable:system_version}}">
            <textarea>
                You are using system version {{variable:system_version}}.
            </textarea>
        </form>
    </body>
</html>';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

var_dump($matches);

RegEx Circuit

jex.im visualizes regular expressions:


Edit for two steps:

<(?:textarea|select)[\s\S]*?>[\s\S]*?<\/(?:textarea|select)>|<(?:input)[\s\S]*?>

Demo 1

^<(?:input)[\s\S]*?({{variable:system_version}})[\s\S]*?>$

Demo 2

^<(?:input).*?({{variable:system_version}}).*?>$

Demo 3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!