I have an array like this:
$sports = array(
\'Softball - Counties\',
\'Softball - Eastern\',
\'Softball - North Harbour\',
\'Softball - South\',
\'Softball -
Here's an elegant, recursive implementation in JavaScript:
function prefix(strings) {
switch (strings.length) {
case 0:
return "";
case 1:
return strings[0];
case 2:
// compute the prefix between the two strings
var a = strings[0],
b = strings[1],
n = Math.min(a.length, b.length),
i = 0;
while (i < n && a.charAt(i) === b.charAt(i))
++i;
return a.substring(0, i);
default:
// return the common prefix of the first string,
// and the common prefix of the rest of the strings
return prefix([ strings[0], prefix(strings.slice(1)) ]);
}
}
How about something like this? It can be further optimised by not having to check the lengths of the strings if we can use the null terminating character (but I am assuming python strings have length cached somewhere?)
def find_common_prefix_len(strings):
"""
Given a list of strings, finds the length common prefix in all of them.
So
apple
applet
application
would return 3
"""
prefix = 0
curr_index = -1
num_strings = len(strings)
string_lengths = [len(s) for s in strings]
while True:
curr_index += 1
ch_in_si = None
for si in xrange(0, num_strings):
if curr_index >= string_lengths[si]:
return prefix
else:
if si == 0:
ch_in_si = strings[0][curr_index]
elif strings[si][curr_index] != ch_in_si:
return prefix
prefix += 1
@bumperbox
Your basic code needed some correction to work in ALL scenarios!
Currently your algorithm fails
In my next answer/post, I will attach iteration optimized code!
Original Bumperbox code PLUS correction (PHP):
function shortest($sports) {
$i = 1;
// loop to the length of the first string
while ($i < strlen($sports[0])) {
// grab the left most part up to i in length
// REMARK: Culturally biased towards LTR writing systems. Better say: Grab frombeginning...
$match = substr($sports[0], 0, $i);
// loop through all the values in array, and compare if they match
foreach ($sports as $sport) {
if ($match != substr($sport, 0, $i)) {
// didn't match, return the part that did match
return substr($sport, 0, $i-1);
}
}
$i++; // increase string length
}
}
function shortestCorrect($sports) {
$i = 1;
while ($i <= strlen($sports[0]) + 1) {
// Grab the string from its beginning with length $i
$match = substr($sports[0], 0, $i);
foreach ($sports as $sport) {
if ($match != substr($sport, 0, $i)) {
return substr($sport, 0, $i-1);
}
}
$i++;
}
// Special case: No mismatch happened until loop end! Thus entire str1 is common prefix!
return $sports[0];
}
$sports1 = array(
'Softball',
'Softball - Eastern',
'Softball - North Harbour');
$sports2 = array(
'Softball - Wester',
'Softball - Western',
);
$sports3 = array(
'Softball - Western',
'Softball - Western',
);
$sports4 = array(
'Softball - Westerner',
'Softball - Western',
);
echo("Output of the original function:\n"); // Failure scenarios
var_dump(shortest($sports1)); // NULL rather than the correct 'Softball'
var_dump(shortest($sports2)); // NULL rather than the correct 'Softball - Wester'
var_dump(shortest($sports3)); // NULL rather than the correct 'Softball - Western'
var_dump(shortest($sports4)); // Only works if the second string is at least one character longer!
echo("\nOutput of the corrected function:\n"); // All scenarios work
var_dump(shortestCorrect($sports1));
var_dump(shortestCorrect($sports2));
var_dump(shortestCorrect($sports3));
var_dump(shortestCorrect($sports4));
I implemented @diogoriba algorithm into code, with this result:
Another idea I implemented:
First check for the shortest string in the array, and use this for comparison rather than simply the first string. In the code, this is implemented with the custom written function arrayStrLenMin().
Code + Extensive Testing + Remarks:
function arrayStrLenMin ($arr, $strictMode = false, $forLoop = false) {
$errArrZeroLength = -1; // Return value for error: Array is empty
$errOtherType = -2; // Return value for error: Found other type (than string in array)
$errStrNone = -3; // Return value for error: No strings found (in array)
$arrLength = count($arr);
if ($arrLength <= 0 ) { return $errArrZeroLength; }
$cur = 0;
foreach ($arr as $key => $val) {
if (is_string($val)) {
$min = strlen($val);
$strFirstFound = $key;
// echo("Key\tLength / Notification / Error\n");
// echo("$key\tFound first string member at key with length: $min!\n");
break;
}
else if ($strictMode) { return $errOtherType; } // At least 1 type other than string was found.
}
if (! isset($min)) { return $errStrNone; } // No string was found in array.
// SpeedRatio of foreach/for is approximately 2/1 as dicussed at:
// http://juliusbeckmann.de/blog/php-foreach-vs-while-vs-for-the-loop-battle.html
// If $strFirstFound is found within the first 1/SpeedRatio (=0.5) of the array, "foreach" is faster!
if (! $forLoop) {
foreach ($arr as $key => $val) {
if (is_string($val)) {
$cur = strlen($val);
// echo("$key\t$cur\n");
if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
if ($cur < $min) { $min = $cur; }
}
// else { echo("$key\tNo string!\n"); }
}
}
// If $strFirstFound is found after the first 1/SpeedRatio (=0.5) of the array, "for" is faster!
else {
for ($i = $strFirstFound + 1; $i < $arrLength; $i++) {
if (is_string($arr[$i])) {
$cur = strlen($arr[$i]);
// echo("$i\t$cur\n");
if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
if ($cur < $min) { $min = $cur; }
}
// else { echo("$i\tNo string!\n"); }
}
}
return $min;
}
function strCommonPrefixByStr($arr, $strFindShortestFirst = false) {
$arrLength = count($arr);
if ($arrLength < 2) { return false; }
// Determine loop length
/// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
/// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
else { $end = strlen($arr[0]); }
for ($i = 1; $i <= $end + 1; $i++) {
// Grab the part from 0 up to $i
$commonStrMax = substr($arr[0], 0, $i);
echo("Match: $i\t$commonStrMax\n");
// Loop through all the values in array, and compare if they match
foreach ($arr as $key => $str) {
echo(" Str: $key\t$str\n");
// Didn't match, return the part that did match
if ($commonStrMax != substr($str, 0, $i)) {
return substr($commonStrMax, 0, $i-1);
}
}
}
// Special case: No mismatch (hence no return) happened until loop end!
return $commonStrMax; // Thus entire first common string is the common prefix!
}
function strCommonPrefixByChar($arr, $strFindShortestFirst = false) {
$arrLength = count($arr);
if ($arrLength < 2) { return false; }
// Determine loop length
/// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
/// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
else { $end = strlen($arr[0]); }
for ($i = 0 ; $i <= $end + 1; $i++) {
// Grab char $i
$char = substr($arr[0], $i, 1);
echo("Match: $i\t"); echo(str_pad($char, $i+1, " ", STR_PAD_LEFT)); echo("\n");
// Loop through all the values in array, and compare if they match
foreach ($arr as $key => $str) {
echo(" Str: $key\t$str\n");
// Didn't match, return the part that did match
if ($char != $str[$i]) { // Same functionality as ($char != substr($str, $i, 1)). Same efficiency?
return substr($arr[0], 0, $i);
}
}
}
// Special case: No mismatch (hence no return) happened until loop end!
return substr($arr[0], 0, $end); // Thus entire first common string is the common prefix!
}
function strCommonPrefixByNeighbour($arr) {
$arrLength = count($arr);
if ($arrLength < 2) { return false; }
/// Get the common string prefix of the first 2 strings
$strCommonMax = strCommonPrefixByChar(array($arr[0], $arr[1]));
if ($strCommonMax === false) { return false; }
if ($strCommonMax == "") { return ""; }
$strCommonMaxLength = strlen($strCommonMax);
/// Now start looping from the 3rd string
echo("-----\n");
for ($i = 2; ($i < $arrLength) && ($strCommonMaxLength >= 1); $i++ ) {
echo(" STR: $i\t{$arr[$i]}\n");
/// Compare the maximum common string with the next neighbour
/*
//// Compare by char: Method unsuitable!
// Iterate from string end to string beginning
for ($ii = $strCommonMaxLength - 1; $ii >= 0; $ii--) {
echo("Match: $ii\t"); echo(str_pad($arr[$i][$ii], $ii+1, " ", STR_PAD_LEFT)); echo("\n");
// If you find the first mismatch from the end, break.
if ($arr[$i][$ii] != $strCommonMax[$ii]) {
$strCommonMaxLength = $ii - 1; break;
// BUT!!! We may falsely assume that the string from the first mismatch until the begining match! This new string neighbour string is completely "unexplored land", there might be differing chars closer to the beginning. This method is not suitable. Better use string comparison than char comparison.
}
}
*/
//// Compare by string
for ($ii = $strCommonMaxLength; $ii > 0; $ii--) {
echo("MATCH: $ii\t$strCommonMax\n");
if (substr($arr[$i],0,$ii) == $strCommonMax) {
break;
}
else {
$strCommonMax = substr($strCommonMax,0,$ii - 1);
$strCommonMaxLength--;
}
}
}
return substr($arr[0], 0, $strCommonMaxLength);
}
// Tests for finding the common prefix
/// Scenarios
$filesLeastInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);
$filesLessInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);
$filesMoreInCommon = array (
"/Voluuuuuuuuuuuuuumes/1/a/a/1",
"/Voluuuuuuuuuuuuuumes/1/a/a/2",
"/Voluuuuuuuuuuuuuumes/1/a/b/1",
"/Voluuuuuuuuuuuuuumes/1/a/b/2",
"/Voluuuuuuuuuuuuuumes/2/a/b/c/1",
"/Voluuuuuuuuuuuuuumes/2/a/a/1",
);
$sameDir = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
);
$sameFile = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/1",
);
$noCommonPrefix = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
"Net/1/a/a/aaaaa/2",
);
$longestLast = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/aaaaa/2",
);
$longestFirst = array (
"/Volumes/1/a/a/aaaaa/1",
"/Volumes/1/a/a/2",
);
$one = array ("/Volumes/1/a/a/aaaaa/1");
$empty = array ( );
// Test Results for finding the common prefix
/*
I tested my functions in many possible scenarios.
The results, the common prefixes, were always correct in all scenarios!
Just try a function call with your individual array!
Considering iteration efficiency, I also performed tests:
I put echo functions into the functions where iterations occur, and measured the number of CLI line output via:
php <script with strCommonPrefixByStr or strCommonPrefixByChar> | egrep "^ Str:" | wc -l GIVES TOTAL ITERATION SUM.
php <Script with strCommonPrefixByNeighbour> | egrep "^ Str:" | wc -l PLUS | egrep "^MATCH:" | wc -l GIVES TOTAL ITERATION SUM.
My hypothesis was proven:
strCommonPrefixByChar wins in situations where the strings have less in common in their beginning (=prefix).
strCommonPrefixByNeighbour wins where there is more in common in the prefixes.
*/
// Test Results Table
// Used Functions | Iteration amount | Remarks
// $result = (strCommonPrefixByStr($filesLessInCommon)); // 35
// $result = (strCommonPrefixByChar($filesLessInCommon)); // 35 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLessInCommon)); // 88 + 42 = 130 // Loses in this category!
// $result = (strCommonPrefixByStr($filesMoreInCommon)); // 137
// $result = (strCommonPrefixByChar($filesMoreInCommon)); // 137 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLeastInCommon)); // 12 + 4 = 16 // Far the winner in this category!
echo("Common prefix of all members:\n");
var_dump($result);
// Tests for finding the shortest string in array
/// Arrays
// $empty = array ();
// $noStrings = array (0,1,2,3.0001,4,false,true,77);
// $stringsOnly = array ("one","two","three","four");
// $mixed = array (0,1,2,3.0001,"four",false,true,"seven", 8888);
/// Scenarios
// I list them from fewest to most iterations, which is not necessarily equivalent to slowest to fastest!
// For speed consider the remarks in the code considering the Speed ratio of foreach/for!
//// Fewest iterations (immediate abort on "Found other type", use "for" loop)
// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
// echo("NEW ANALYSIS:\n");
// echo("Result: " . arrayStrLenMin($arr, true, true) . "\n\n");
// }
/* Results:
NEW ANALYSIS:
Result: Array is empty!
NEW ANALYSIS:
Result: Found other type!
NEW ANALYSIS:
Key Length / Notification / Error
0 Found first string member at key with length: 3!
1 3
2 5
3 4
Result: 3
NEW ANALYSIS:
Result: Found other type!
*/
//// Fewer iterations (immediate abort on "Found other type", use "foreach" loop)
// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
// echo("NEW ANALYSIS:\n");
// echo("Result: " . arrayStrLenMin($arr, true, false) . "\n\n");
// }
/* Results:
NEW ANALYSIS:
Result: Array is empty!
NEW ANALYSIS:
Result: Found other type!
NEW ANALYSIS:
Key Length / Notification / Error
0 Found first string member at key with length: 3!
0 3
1 3
2 5
3 4
Result: 3
NEW ANALYSIS:
Result: Found other type!
*/
//// More iterations (No immediate abort on "Found other type", use "for" loop)
// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
// echo("NEW ANALYSIS:\n");
// echo("Result: " . arrayStrLenMin($arr, false, true) . "\n\n");
// }
/* Results:
NEW ANALYSIS:
Result: Array is empty!
NEW ANALYSIS:
Result: No strings found!
NEW ANALYSIS:
Key Length / Notification / Error
0 Found first string member at key with length: 3!
1 3
2 5
3 4
Result: 3
NEW ANALYSIS:
Key Length / Notification / Error
4 Found first string member at key with length: 4!
5 No string!
6 No string!
7 5
8 No string!
Result: 4
*/
//// Most iterations (No immediate abort on "Found other type", use "foreach" loop)
// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
// echo("NEW ANALYSIS:\n");
// echo("Result: " . arrayStrLenMin($arr, false, false) . "\n\n");
// }
/* Results:
NEW ANALYSIS:
Result: Array is empty!
NEW ANALYSIS:
Result: No strings found!
NEW ANALYSIS:
Key Length / Notification / Error
0 Found first string member at key with length: 3!
0 3
1 3
2 5
3 4
Result: 3
NEW ANALYSIS:
Key Length / Notification / Error
4 Found first string member at key with length: 4!
0 No string!
1 No string!
2 No string!
3 No string!
4 4
5 No string!
6 No string!
7 5
8 No string!
Result: 4
*/
I would use a recursive algorithm like this:
1 - get the first string in the array 2 - call the recursive prefix method with the first string as a param 3 - if prefix is empty return no prefix 4 - loop through all the strings in the array 4.1 - if any of the strings does not start with the prefix 4.1.1 - call recursive prefix method with prefix - 1 as a param 4.2 return prefix