I have a
List
with 1500 strings. I am now using the following code to pull out only string that start with the string prefix
If you have the list in alpabetical order, you can use a variation of binary search to make it a lot faster.
As a starting point, this will return the index of one of the strings that match the prefix, so then you can look forward and backward in the list to find the rest:
public static int BinarySearchStartsWith(List<string> words, string prefix, int min, int max) {
while (max >= min) {
int mid = (min + max) / 2;
int comp = String.Compare(words[mid].Substring(0, prefix.Length), prefix);
if (comp < 0) {
min = mid + 1;
} else if (comp > 0) {
max = mid - 1;
} else {
return mid;
}
}
return -1;
}
int index = BinarySearchStartsWith(theList, "pre", 0, theList.Count - 1);
if (index == -1) {
// not found
} else{
// found
}
Note: If you use a prefix that is longer than any of the strings that are compared, it will break, so you might need to figure out how you want to handle that.
1500 is usually too few:
you could search it in parallel with a simple divide and conquer of the problem. Search each half of the list in two (or divide into three, four, ..., parts) different jobs/threads.
Or store the strings in a (not binary) tree instead. Will be O(log n).
sorted in alphabetical order you can do a binary search (sort of the same as the previous one)
You can use PLINQ (Parallel LINQ) to make the execution faster:
var newList = list.AsParallel().Where(x => x.StartsWith(prefixText)).ToList()
I would go with using Linq:
var query = list.Where(w => w.StartsWith("prefixText")).Select(s => s).ToList();
I assume that the really fastest way would be to generate a dictionary with all possible prefixes from your 1500 strings, effectively precomputing the results for all possible searches that will return non-empty. Your search would then be simply a dictionary lookup completing in O(1) time. This is a case of trading memory (and initialization time) for speed.
private IDictionary<string, string[]> prefixedStrings;
public void Construct(IEnumerable<string> strings)
{
this.prefixedStrings =
(
from s in strings
from i in Enumerable.Range(1, s.Length)
let p = s.Substring(0, i)
group s by p
).ToDictionary(
g => g.Key,
g => g.ToArray());
}
public string[] Search(string prefix)
{
string[] result;
if (this.prefixedStrings.TryGetValue(prefix, out result))
return result;
return new string[0];
}
Thus 1500 is not really a huge number binary search on sorted list would be enough probably. Nevertheless most efficient algorithms for prefix search are based on the data structure named Trie or Prefix Tree. See: http://en.wikipedia.org/wiki/Trie
Following picture demonstrates the idea very briefly:
For c# implementation see for instance .NET DATA STRUCTURES FOR PREFIX STRING SEARCH AND SUBSTRING (INFIX) SEARCH TO IMPLEMENT AUTO-COMPLETION AND INTELLI-SENSE