Text replacement: PHP/regex

后端 未结 2 2020
滥情空心
滥情空心 2021-01-23 23:38

I am presented with an HTML document similar to this in view source mode (the below is simplified for brevity):


    
        

        
相关标签:
2条回答
  • 2021-01-24 00:03

    My guess is that you are likely designing an expression similar to:

    <(?:textarea|select)[\s\S]*?({{variable:system_version}})[\s\S]*?<\/(?:textarea|select)>|<(?:input)[\s\S]*?({{variable:system_version}})[\s\S]*?>
    

    which you might probably want to modify it, and then replace with what you like to replace.

    The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

    Test

    $re = '/<(?:textarea|select)[\s\S]*?({{variable:system_version}})[\s\S]*?<\/(?:textarea|select)>|<(?:input)[\s\S]*?({{variable:system_version}})[\s\S]*?>/m';
    $str = '<html>
        <head>
            <title>System version: 6.0</title>
        </head>
        <body>
            <p>You are using system version 6.0</p>
            <div>
                This was the content of the welcome block.
            </div>
            <form>
                <input value="System version: {{variable:system_version}}">
                <textarea>
                    You are using system version {{variable:system_version}}.
                </textarea>
            </form>
        </body>
    </html>';
    
    preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
    
    var_dump($matches);
    

    RegEx Circuit

    jex.im visualizes regular expressions:


    Edit for two steps:

    <(?:textarea|select)[\s\S]*?>[\s\S]*?<\/(?:textarea|select)>|<(?:input)[\s\S]*?>
    

    Demo 1

    ^<(?:input)[\s\S]*?({{variable:system_version}})[\s\S]*?>$
    

    Demo 2

    ^<(?:input).*?({{variable:system_version}}).*?>$
    

    Demo 3

    0 讨论(0)
  • 2021-01-24 00:20

    I was about to post an answer on your next question but Casimir closed it before I got the chance. I am coming back here to post a proper html parse-then-replace technique for the benefit of researchers and you.

    Code: (Demo)

    define('LOOKUP', [
        'block' => [
            'welcome-intro'         => 'custom intro'
        ],
        'variable' => [
            'contact-email-address' => 'mmu@mmu.com',
            'system_version'        => 'sys ver',
            'system_name'           => 'sys name',
            'system_login'          => 'sys login',
            'activate_url'          => 'some url'
        ],
    
    ]);
    
    $dom = new DOMDocument();
    libxml_use_internal_errors(true);
    $dom->loadHTML($html);
    $xpath = new DOMXpath($dom);
    
    foreach ($xpath->query("//*[not(self::textarea or self::select or self::input) and contains(., '{{{')]/text()") as $node) {
        $node->nodeValue = preg_replace_callback('~{{{([^:]+):([^}]+)}}}~', function($m) {
                return LOOKUP[$m[1]][$m[2]] ?? '**unknown variable**';
            },
            $node->nodeValue);
    }
    echo $dom->saveHTML();
    

    Output:

    <!DOCTYPE html>
    <html lang="en"><head><meta charset="utf-8"><title>Test</title></head><body>
        <section id="about"><div class="container about-container">
                <div class="row">
                    <div class="col-md-12">
                        custom intro
                    </div>
                </div>
            </div>
        </section><section id="services"><div class="container">
                <div class="row">
                    <div class="col-md-12">
                                            <p>You are using system version: sys ver</p>
                        <p>Your address: mmu@mmu.com</p>
                        <form action="http://k.loc/content/view/welcome" class="default-form" enctype="multipart/form-data" method="post" accept-charset="utf-8">
                                                                                        <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78"><div class="row">
                                <div class="col-sm-12 form-error"></div>
                            </div>
                        <div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
                        <div class="control-label">
                            <label for="testinput">Name<span class="form-validation-required"> * </span></label>
    
                        </div>
                    <div class="hint-text">Enter at least 2 characters and a maximum of 12 characters.</div><input id="testinput" name="testinput" placeholder="Enter your name here." class="input-group width-50" type="text" value="{{{variable:system_name}}}  {{{variable:system_login}}}"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
                        <div class="control-label">
                            <label for="testpassword">Password</label>
    
                        </div>
                    <div class="hint-text">Your password must be at least 12 characters long, contain 1 special character, 1 nunber, 1 lower case character and 1 upper case character.</div><input id="testpassword" name="testpassword" placeholder="Enter your password here." class="input-group width-50" type="password"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><fieldset id="bioinfo"><legend>Biographical information</legend><div class="row"><div class="col-sm-12">
                        <div class="control-label">
                            <label for="testtextarea">Biography</label>
                    <span class="hint-text">A minimum of 40 characters and a maximum of 255 is allowed. This hint is displayed inline.</span>
                        </div>
                    <textarea id="testtextarea" name="testtextarea" placeholder="Please enter your biography here." class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}}
    
    {{{variable:system_login}}}</textarea><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
                        <div class="control-label">
                            <label for="testsummernote">Interests</label>
                    <span class="hint-text">A minimum of 40 characters is required. This hint is displayed inline.</span>
                        </div>
                    <textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="Please enter your interests here."><p>sys name<br></p><p>sys login</p><p>some url<br></p></textarea></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><button name="testsubmit" id="testsubmit" type="submit" class="btn primary">Submit<i class="zmdi zmdi-arrow-forward"></i></button></div></div>
            </form>                </div>
                </div>
            </div>
        </section></body></html>
    

    There aren't too many tricks involved.

    1. Parse the HTML with DOMDocument and write a filtering query with XPath which requires nodes to not be textarea|select|input tags and they must contain {{{ in their text. There will be several "magical" ways to filter the dom -- this is just one way that feels efficient/direct to me.

    2. I use preg_replace_callback() to perform replacements based on a lookup array.

    3. To avoid use() in the callback syntax, I make the lookup available inside the callback's scope by declaring it as a constant (I can't imagine you need it to be a variable anyhow).

    4. I found during testing that DOMDocument didn't like the <section> tags, so I silenced the complaints with libxml_use_internal_errors(true);.

    0 讨论(0)
提交回复
热议问题