I have a website with stories in it. I can have multiple types of stories within multiple categories like:
If you know what the possible correct URLs could be, you can use:
levenshtein($givenURL, $possibleURL)
Example from PHP docs, comments removed for brevity:
$input = 'carrrot';
$words = array('apple','pineapple','banana','orange',
'radish','carrot','pea','bean','potato');
$shortest = -1;
foreach ($words as $word) {
$lev = levenshtein($input, $word);
if ($lev == 0) {
$closest = $word;
$shortest = 0;
break;
}
if ($lev <= $shortest || $shortest < 0) {
$closest = $word;
$shortest = $lev;
}
}
echo $shortest == 0 ? "Exact match found: $closest\n" : "Did you mean: $closest?\n";
Outputs:
Input word: carrrot
Did you mean: carrot?
This is good when you think people may have omitted a letter or put an extra one in, but it may fall short when people genuinely don't know how to spell a word and came up with something creative!
If you prefer the soundex() route, take a look at the metaphone() function.
I like the idea of using metaphone()
alongside levenshtein()
or , as it returns a phonetic representation of the word, and you still want to see how similar it is to your original.similar_text()
Examples:
metaphone('name') = NM
metaphone('naaaaaameeeeeeee') = NM
metaphone('naiym') = NM
metaphone('naiyem') = NYM
While a lot of misspellings will return an identical match, the last example shows that you really still want to find the closest match with something like levenshtein()
For efficiency, if you use a different 404 file where the rewrites tried to match this pattern and failed, than you use for the rest of the site, it really shouldn't really be a massive overhead.
If you're getting the same 404 from the same referrer a lot, (and can't get them to change the link) it might be worth just putting a static rewrite in for that case.