Is there a way to simulate the LIKE operator of SQL in PHP with the same syntax? (%
and _
wildcards and a generic $escape
escape chara
OK, after much fun and games here's what I have come up with:
function preg_sql_like ($input, $pattern, $escape = '\\') {
// Split the pattern into special sequences and the rest
$expr = '/((?:'.preg_quote($escape, '/').')?(?:'.preg_quote($escape, '/').'|%|_))/';
$parts = preg_split($expr, $pattern, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
// Loop the split parts and convert/escape as necessary to build regex
$expr = '/^';
$lastWasPercent = FALSE;
foreach ($parts as $part) {
switch ($part) {
case $escape.$escape:
$expr .= preg_quote($escape, '/');
break;
case $escape.'%':
$expr .= '%';
break;
case $escape.'_':
$expr .= '_';
break;
case '%':
if (!$lastWasPercent) {
$expr .= '.*?';
}
break;
case '_':
$expr .= '.';
break;
default:
$expr .= preg_quote($part, '/');
break;
}
$lastWasPercent = $part == '%';
}
$expr .= '$/i';
// Look for a match and return bool
return (bool) preg_match($expr, $input);
}
I can't break it, maybe you can find something that will. The main way in which mine differs from @nickb's is that mine "parses"(ish) the input expression into tokens to generate a regex, rather than converting it to a regex in situ.
The first 3 arguments to the function should be fairly self explanatory. The fourth allows you to pass PCRE modifiers to affect the final regex used for the match. The main reason I put this in is to allow you to pass Removed per comments belowi
so it is case insensitive - I can't think of any other modifiers that will be safe to use but that may not be the case.
Function simply returns a boolean indicating whether the $input
text matched the $pattern
or not.
Here's a codepad of it
EDIT Oops, was broken, now fixed. New codepad
EDIT Removed fourth argument and made all matches case-insensitive per comments below
EDIT A couple of small fixes/improvements:
.*?
sequences in generated regexThis is basically how you would implement something like this:
$input = '%ST!_ING_!%';
$value = 'ANYCHARS HERE TEST_INGS%';
// Mapping of wildcards to their PCRE equivalents
$wildcards = array( '%' => '.*?', '_' => '.');
// Escape character for preventing wildcard functionality on a wildcard
$escape = '!';
// Shouldn't have to modify much below this
$delimiter = '/'; // regex delimiter
// Quote the escape characters and the wildcard characters
$quoted_escape = preg_quote( $escape);
$quoted_wildcards = array_map( function( $el) { return preg_quote( $el); }, array_keys( $wildcards));
// Form the dynamic regex for the wildcards by replacing the "fake" wildcards with PRCE ones
$temp_regex = '((?:' . $quoted_escape . ')?)(' . implode( '|', $quoted_wildcards) . ')';
// Escape the regex delimiter if it's present within the regex
$wildcard_replacement_regex = $delimiter . str_replace( $delimiter, '\\' . $delimiter, $temp_regex) . $delimiter;
// Do the actual replacement
$regex = preg_replace_callback( $wildcard_replacement_regex, function( $matches) use( $wildcards) { return !empty( $matches[1]) ? preg_quote( $matches[2]) : $wildcards[$matches[2]]; }, preg_quote( $input));
// Finally, test the regex against the input $value, escaping the delimiter if it's present
preg_match( $delimiter . str_replace( $delimiter, '\\' . $delimiter, $regex) . $delimiter .'i', $value, $matches);
// Output is in $matches[0] if there was a match
var_dump( $matches[0]);
This forms a dynamic regex based on $wildcards
and $escape
in order to replace all "fake" wildcards with their PCRE equivalents, unless the "fake" wildcard character is prefixed with the escape character, in which case, no replacement is made. In order to do this replacement, the $wildcard_replacement_regex
is created.
The $wildcard_replacement_regex
looks something like this once everything's all said and done:
/((?:\!)?)(%|_)/
So it uses two capturing groups to (optionally) grab the escape character and one of the wildcards. This enables us to test to see if it grabbed the escape character in the callback. If it was able to get the escape character before the wildcard, $matches[1]
will contain the escape character. If not, $matches[1]
will be empty. This is how I determine whether to replace the wildcard with its PCRE equivalent, or leave it alone by just preg_quote()
-ing it.
You can play around with it at codepad.
The other examples were a bit too complex for my taste (and painful to my clean code eyes), so I reimplemented the functionality in this simple method:
public function like($needle, $haystack, $delimiter = '~')
{
// Escape meta-characters from the string so that they don't gain special significance in the regex
$needle = preg_quote($needle, $delimiter);
// Replace SQL wildcards with regex wildcards
$needle = str_replace('%', '.*?', $needle);
$needle = str_replace('_', '.', $needle);
// Add delimiters, beginning + end of line and modifiers
$needle = $delimiter . '^' . $needle . '$' . $delimiter . 'isu';
// Matches are not useful in this case; we just need to know whether or not the needle was found.
return (bool) preg_match($needle, $haystack);
}
Modifiers:
i
: Ignore casing.s
: Make dot metacharacter match anything, including newlines.u
: UTF-8 compatibility.You can use regexp, for example: preg_match
.