I have a
List
with 1500 strings. I am now using the following code to pull out only string that start with the string prefix
The question to me is whether or not you'll need to do this one time or multiple times.
If you only find the StartsWithPrefix list one time, you can't get faster then leaving the original list as is and doing myList.Where(s => s.StartsWith(prefix))
. This looks at every string one time so it's O(n)
If you need to find the StartsWithPrefix list several times, or maybe you're going to want to add or remove strings to the original list and update the StartsWithPrefix list then you should sort the original list and use binary search. But this will be sort time + search time = O(n log n) + 2 * O(log n)
If you did the binary search method, you would find the indexes of the first occurrence of your prefix and the last occurrence via search. Then do mySortedList.Skip(n).Take(m-n)
where n is first index and m is last index.
Wait a minute, we're using the wrong tool for the job. Use a Trie! If you put all your strings into a Trie instead of the list, all you have to do is walk down the trie with your prefix and grab all the words underneath that node.
So many approches were analyzed to achive minimum data capacity and high performance. The first place is: all prefixes are stored in dictionary: key - prefix, values - items appropriate for prefix.
Here simple implementation of this algorithm:
public class Trie<TItem>
{
#region Constructors
public Trie(
IEnumerable<TItem> items,
Func<TItem, string> keySelector,
IComparer<TItem> comparer)
{
this.KeySelector = keySelector;
this.Comparer = comparer;
this.Items = (from item in items
from i in Enumerable.Range(1, this.KeySelector(item).Length)
let key = this.KeySelector(item).Substring(0, i)
group item by key)
.ToDictionary( group => group.Key, group => group.ToList());
}
#endregion
#region Properties
protected Dictionary<string, List<TItem>> Items { get; set; }
protected Func<TItem, string> KeySelector { get; set; }
protected IComparer<TItem> Comparer { get; set; }
#endregion
#region Methods
public List<TItem> Retrieve(string prefix)
{
return this.Items.ContainsKey(prefix)
? this.Items[prefix]
: new List<TItem>();
}
public void Add(TItem item)
{
var keys = (from i in Enumerable.Range(1, this.KeySelector(item).Length)
let key = this.KeySelector(item).Substring(0, i)
select key).ToList();
keys.ForEach(key =>
{
if (!this.Items.ContainsKey(key))
{
this.Items.Add(key, new List<TItem> { item });
}
else if (this.Items[key].All(x => this.Comparer.Compare(x, item) != 0))
{
this.Items[key].Add(item);
}
});
}
public void Remove(TItem item)
{
this.Items.Keys.ToList().ForEach(key =>
{
if (this.Items[key].Any(x => this.Comparer.Compare(x, item) == 0))
{
this.Items[key].RemoveAll(x => this.Comparer.Compare(x, item) == 0);
if (this.Items[key].Count == 0)
{
this.Items.Remove(key);
}
}
});
}
#endregion
}
You can accelerate a bit by comparing the first character before invoking StartsWith:
char first = prefixText[0];
foreach(string a in <MYLIST>)
{
if (a[0]==first)
{
if(a.StartsWith(prefixText, true, null))
{
newlist.Add(a);
}
}
}
Have you tried implementing a Dictionary and comparing the results? Or, if you do put the entries in alphabetical order, try a binary search.