High performance “contains” search in list of strings in C#

前端 未结 7 1518
情话喂你
情话喂你 2020-12-05 07:54

I have a list of approx. 500,000 strings, each approx. 100 characters long. Given a search term, I want to identify all strings in the list that contain the search term. At

相关标签:
7条回答
  • 2020-12-05 08:28

    Have you tried loading your strings into a List<string> and then using the Linq extensions Contains method?

    var myList = new List<string>();
    //Code to load your list goes here...
    
    var searchTerm = "find this";
    var match = myList.Contains(searchTerm);
    
    0 讨论(0)
  • 2020-12-05 08:28
    public static bool ContainsFast<T>(this IList<T> list, T item)
    {
        return list.IndexOf(item) >= 0;
    }
    

    Base on tests that I did, this variation of Contains was about 33% faster on my side.

    0 讨论(0)
  • 2020-12-05 08:28

    According to these benchmarks, the fastest way to check if a string occurs in a string is the following:

    for (int x = 0; x < ss.Length; x++)
        for (int y = 0; y < sf.Length; y++
            c[y] += ((ss[x].Length - ss[x].Replace(sf[y], String.Empty).Length) / sf[y].Length > 0 ? 1 : 0);
    

    Thus, you could:

    1. Loop through the list using a Parallel.For construct
    2. Implement the code above to check if a string contains what you're looking for. "SS" is the string[] of strings to search; "SF" is the string[] of strings to search for; c[y] is the total count of each one found.

    Obviously you'd have to adapt them to your List[string] (or whatever data structure you're using).

    0 讨论(0)
  • 2020-12-05 08:30

    You should try to use Dictionary class. It's much faster than List because it's an indexed search.

    Dictionary<String, String> ldapDocument = new Dictionary<String, String>();
    //load your list here
    //Sample -> ldapDocument.Add("014548787","014548787");
    var match = ldapDocument.ContainsKey(stringToMatch);
    
    0 讨论(0)
  • 2020-12-05 08:39

    I've heard good things about Lucene.NET when it comes to performing quick full-text searches. They've done the work to figure out the fastest data structures and such to use. I'd suggest giving that a shot.

    Otherwise, you might just try something like this:

    var matches = list.AsParallel().Where(s => s.Contains(searchTerm)).ToList();
    

    But it probably won't get you down to 100ms.

    0 讨论(0)
  • 2020-12-05 08:42

    Have you tried the following?

    list.FindAll(x => x.Contains("YourTerm")).ToList();
    

    For some reason the List.AsParallel().Where(...) is slower than list.FindAll(...) on my PC.

    list.AsParallel().Where(x => x.Contains("YourTerm")).ToList();
    

    Hope this will help you.

    0 讨论(0)
提交回复
热议问题