问题
I am looking for a function like strpos() with two significant differences:
- To be able to accept multiple needles. I mean thousands of needles at ones.
- To search for all occurrences of the needles in the haystack and to return an array of starting positions.
Of course it has to be an efficient solution not just a loop through every needle. I have searched through this forum and there were similar questions to this one, like:
- Using an array as needles in strpos
- Define multiple needles using stripos
- Can't search an array in PHP in_array for the presence of multiple needles
but nether of them was what I am looking for. I am using strpos just to illustrate my question better, probably something entirely different has to be used for this purpose.
I am aware of Zend_Search_Lucene and I am interested if it can be used to achieve this and how (just the general idea)?
Thanks a lot for Your help and time!
回答1:
Here's some sample code for my strategy:
function strpos_array($haystack, $needles, $offset=0) {
$matches = array();
//Avoid the obvious: when haystack or needles are empty, return no matches
if(empty($needles) || empty($haystack)) {
return $matches;
}
$haystack = (string)$haystack; //Pre-cast non-string haystacks
$haylen = strlen($haystack);
//Allow negative (from end of haystack) offsets
if($offset < 0) {
$offset += $heylen;
}
//Use strpos if there is no array or only one needle
if(!is_array($needles)) {
$needles = array($needles);
}
$needles = array_unique($needles); //Not necessary if you are sure all needles are unique
//Precalculate needle lengths to save time
foreach($needles as &$origNeedle) {
$origNeedle = array((string)$origNeedle, strlen($origNeedle));
}
//Find matches
for(; $offset < $haylen; $offset++) {
foreach($needles as $needle) {
list($needle, $length) = $needle;
if($needle == substr($haystack, $offset, $length)) {
$matches[] = $offset;
break;
}
}
}
return($matches);
}
I've implemented a simple brute force method above that will work with any combination of needles and haystacks (not just words). For possibly faster algorithms check out:
- Aho–Corasick string matching algorithm
Other Solution
function strpos_array($haystack, $needles, $theOffset=0) {
$matches = array();
if(empty($haystack) || empty($needles)) {
return $matches;
}
$haylen = strlen($haystack);
if($theOffset < 0) { // Support negative offsets
$theOffest += $haylen;
}
foreach($needles as $needle) {
$needlelen = strlen($needle);
$offset = $theOffset;
while(($match = strpos($haystack, $needle, $offset)) !== false) {
$matches[] = $match;
$offset = $match + $needlelen;
if($offset >= $haylen) {
break;
}
}
}
return $matches;
}
回答2:
try preg match for multiple
if (preg_match('/word|word2/i', $str))
Checking for multiple strpos values
回答3:
I know this doesn't answer the OP's question but wanted to comment since this page is at the top of Google for strpos with multiple needles. Here's a simple solution to do so (again, this isn't specific to the OP's question - sorry):
$img_formats = array('.jpg','.png');
$missing = array();
foreach ( $img_formats as $format )
if ( stripos($post['timer_background_image'], $format) === false ) $missing[] = $format;
if (count($missing) == 2)
return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));
If 2 items are added to the $missing array that means that the input doesn't satisfy any of the image formats in the $img_formats array. At that point you know that you can return an error, etc. This could easily be turned into a little function:
function m_stripos( $haystack = null, $needles = array() ){
//return early if missing arguments
if ( !$needles || !$haystack ) return false;
// create an array to evaluate at the end
$missing = array();
//Loop through needles array, and add to $missing array if not satisfied
foreach ( $needles as $needle )
if ( stripos($haystack, $needle) === false ) $missing[] = $needle;
//If the count of $missing and $needles is equal, we know there were no matches, return false..
if (count($missing) == count($needles)) return false;
//If we're here, be happy, return true...
return true;
}
Back to our first example using then the function instead:
$needles = array('.jpg','.png');
if ( !m_strpos( $post['timer_background_image'], $needles ) )
return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));
Of course, what you do after the function returns true or false is up to you.
回答4:
It seems you are searching for whole words. In this case, something like this might help. As it uses built-in functions, it should be faster than custom code, but you have to profile it:
$words = str_word_count($str, 2);
$word_position_map = array();
foreach($words as $position => $word) {
if(!isset($word_position_map[$word])) {
$word_position_map[$word] = array();
}
$word_position_map[$word][] = $position;
}
// assuming $needles is an array of words
$result = array_intersect_key($word_position_map, array_flip($needles));
Storing the information (like the needles) in the right format will improve the runtime ( e.g. as you don't have to call array_flip
).
Note from the str_word_count documentation:
For the purpose of this function, 'word' is defined as a locale dependent string containing alphabetic characters, which also may contain, but not start with "'" and "-" characters.
So make sure you set the locale right.
回答5:
You could use a regular expression, they support OR operations. This would however make it fairly slow, compared to strpos.
来源:https://stackoverflow.com/questions/6896587/strpos-with-multiple-needles