Text replacement: PHP/regex

后端 未结 2 2018
滥情空心
滥情空心 2021-01-23 23:38

I am presented with an HTML document similar to this in view source mode (the below is simplified for brevity):


    
        

        
2条回答
  •  -上瘾入骨i
    2021-01-24 00:20

    I was about to post an answer on your next question but Casimir closed it before I got the chance. I am coming back here to post a proper html parse-then-replace technique for the benefit of researchers and you.

    Code: (Demo)

    define('LOOKUP', [
        'block' => [
            'welcome-intro'         => 'custom intro'
        ],
        'variable' => [
            'contact-email-address' => 'mmu@mmu.com',
            'system_version'        => 'sys ver',
            'system_name'           => 'sys name',
            'system_login'          => 'sys login',
            'activate_url'          => 'some url'
        ],
    
    ]);
    
    $dom = new DOMDocument();
    libxml_use_internal_errors(true);
    $dom->loadHTML($html);
    $xpath = new DOMXpath($dom);
    
    foreach ($xpath->query("//*[not(self::textarea or self::select or self::input) and contains(., '{{{')]/text()") as $node) {
        $node->nodeValue = preg_replace_callback('~{{{([^:]+):([^}]+)}}}~', function($m) {
                return LOOKUP[$m[1]][$m[2]] ?? '**unknown variable**';
            },
            $node->nodeValue);
    }
    echo $dom->saveHTML();
    

    Output:

    
    Test
        
    custom intro

    You are using system version: sys ver

    Your address: mmu@mmu.com

    Personal information
    Enter at least 2 characters and a maximum of 12 characters.
    Your password must be at least 12 characters long, contain 1 special character, 1 nunber, 1 lower case character and 1 upper case character.
    Biographical information
    A minimum of 40 characters and a maximum of 255 is allowed. This hint is displayed inline.
    A minimum of 40 characters is required. This hint is displayed inline.

    There aren't too many tricks involved.

    1. Parse the HTML with DOMDocument and write a filtering query with XPath which requires nodes to not be textarea|select|input tags and they must contain {{{ in their text. There will be several "magical" ways to filter the dom -- this is just one way that feels efficient/direct to me.

    2. I use preg_replace_callback() to perform replacements based on a lookup array.

    3. To avoid use() in the callback syntax, I make the lookup available inside the callback's scope by declaring it as a constant (I can't imagine you need it to be a variable anyhow).

    4. I found during testing that DOMDocument didn't like the

      tags, so I silenced the complaints with libxml_use_internal_errors(true);.

提交回复
热议问题