Simulating LIKE in PHP

后端 未结 4 541
忘掉有多难
忘掉有多难 2021-01-05 05:21

Is there a way to simulate the LIKE operator of SQL in PHP with the same syntax? (% and _ wildcards and a generic $escape escape chara

相关标签:
4条回答
  • 2021-01-05 05:35

    OK, after much fun and games here's what I have come up with:

    function preg_sql_like ($input, $pattern, $escape = '\\') {
    
        // Split the pattern into special sequences and the rest
        $expr = '/((?:'.preg_quote($escape, '/').')?(?:'.preg_quote($escape, '/').'|%|_))/';
        $parts = preg_split($expr, $pattern, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
    
        // Loop the split parts and convert/escape as necessary to build regex
        $expr = '/^';
        $lastWasPercent = FALSE;
        foreach ($parts as $part) {
            switch ($part) {
                case $escape.$escape:
                    $expr .= preg_quote($escape, '/');
                    break;
                case $escape.'%':
                    $expr .= '%';
                    break;
                case $escape.'_':
                    $expr .= '_';
                    break;
                case '%':
                    if (!$lastWasPercent) {
                        $expr .= '.*?';
                    }
                    break;
                case '_':
                    $expr .= '.';
                    break;
                default:
                    $expr .= preg_quote($part, '/');
                    break;
            }
            $lastWasPercent = $part == '%';
        }
        $expr .= '$/i';
    
        // Look for a match and return bool
        return (bool) preg_match($expr, $input);
    
    }
    

    I can't break it, maybe you can find something that will. The main way in which mine differs from @nickb's is that mine "parses"(ish) the input expression into tokens to generate a regex, rather than converting it to a regex in situ.

    The first 3 arguments to the function should be fairly self explanatory. The fourth allows you to pass PCRE modifiers to affect the final regex used for the match. The main reason I put this in is to allow you to pass i so it is case insensitive - I can't think of any other modifiers that will be safe to use but that may not be the case. Removed per comments below

    Function simply returns a boolean indicating whether the $input text matched the $pattern or not.

    Here's a codepad of it

    EDIT Oops, was broken, now fixed. New codepad

    EDIT Removed fourth argument and made all matches case-insensitive per comments below

    EDIT A couple of small fixes/improvements:

    • Added start/end of string assertions to generated regex
    • Added tracking of last token to avoid multiple .*? sequences in generated regex
    0 讨论(0)
  • 2021-01-05 05:35

    This is basically how you would implement something like this:

    $input = '%ST!_ING_!%';
    $value = 'ANYCHARS HERE TEST_INGS%';
    
    // Mapping of wildcards to their PCRE equivalents
    $wildcards = array( '%' => '.*?', '_' => '.');
    
    // Escape character for preventing wildcard functionality on a wildcard
    $escape = '!';
    
    // Shouldn't have to modify much below this
    
    $delimiter = '/'; // regex delimiter
    
    // Quote the escape characters and the wildcard characters
    $quoted_escape = preg_quote( $escape);
    $quoted_wildcards = array_map( function( $el) { return preg_quote( $el); }, array_keys( $wildcards));
    
    // Form the dynamic regex for the wildcards by replacing the "fake" wildcards with PRCE ones
    $temp_regex = '((?:' . $quoted_escape . ')?)(' . implode( '|', $quoted_wildcards) . ')';
    
    // Escape the regex delimiter if it's present within the regex
    $wildcard_replacement_regex = $delimiter . str_replace( $delimiter, '\\' . $delimiter, $temp_regex) . $delimiter;
    
    // Do the actual replacement
    $regex = preg_replace_callback( $wildcard_replacement_regex, function( $matches) use( $wildcards) { return !empty( $matches[1]) ? preg_quote( $matches[2]) : $wildcards[$matches[2]]; }, preg_quote( $input)); 
    
    // Finally, test the regex against the input $value, escaping the delimiter if it's present
    preg_match( $delimiter . str_replace( $delimiter, '\\' . $delimiter, $regex) . $delimiter .'i', $value, $matches);
    
    // Output is in $matches[0] if there was a match
    var_dump( $matches[0]);
    

    This forms a dynamic regex based on $wildcards and $escape in order to replace all "fake" wildcards with their PCRE equivalents, unless the "fake" wildcard character is prefixed with the escape character, in which case, no replacement is made. In order to do this replacement, the $wildcard_replacement_regex is created.

    The $wildcard_replacement_regex looks something like this once everything's all said and done:

    /((?:\!)?)(%|_)/
    

    So it uses two capturing groups to (optionally) grab the escape character and one of the wildcards. This enables us to test to see if it grabbed the escape character in the callback. If it was able to get the escape character before the wildcard, $matches[1] will contain the escape character. If not, $matches[1] will be empty. This is how I determine whether to replace the wildcard with its PCRE equivalent, or leave it alone by just preg_quote()-ing it.

    You can play around with it at codepad.

    0 讨论(0)
  • 2021-01-05 05:41

    The other examples were a bit too complex for my taste (and painful to my clean code eyes), so I reimplemented the functionality in this simple method:

    public function like($needle, $haystack, $delimiter = '~')
    {
        // Escape meta-characters from the string so that they don't gain special significance in the regex
        $needle = preg_quote($needle, $delimiter);
    
        // Replace SQL wildcards with regex wildcards
        $needle = str_replace('%', '.*?', $needle);
        $needle = str_replace('_', '.', $needle);
    
        // Add delimiters, beginning + end of line and modifiers
        $needle = $delimiter . '^' . $needle . '$' . $delimiter . 'isu';
    
        // Matches are not useful in this case; we just need to know whether or not the needle was found.
        return (bool) preg_match($needle, $haystack);
    }
    

    Modifiers:

    • i: Ignore casing.
    • s: Make dot metacharacter match anything, including newlines.
    • u: UTF-8 compatibility.
    0 讨论(0)
  • 2021-01-05 05:48

    You can use regexp, for example: preg_match.

    0 讨论(0)
提交回复
热议问题