C# Finding relevant document snippets for search result display

后端 未结 8 541
野性不改
野性不改 2021-02-04 12:07

In developing search for a site I am building, I decided to go the cheap and quick way and use Microsoft Sql Server\'s Full Text Search engine instead of something more robust l

8条回答
  •  梦毁少年i
    2021-02-04 13:12

    I know this thread is way old, but I gave this a try last week and it was a pain in the back side. This is far from perfect, but this is what I came up with.

    The snippet generator:

    public static string SelectKeywordSnippets(string StringToSnip, string[] Keywords, int SnippetLength)
        {
            string snippedString = "";
            List keywordLocations = new List();
    
            //Get the locations of all keywords
            for (int i = 0; i < Keywords.Count(); i++)
                keywordLocations.AddRange(SharedTools.IndexOfAll(StringToSnip, Keywords[i], StringComparison.CurrentCultureIgnoreCase));
    
            //Sort locations
            keywordLocations.Sort();
    
            //Remove locations which are closer to each other than the SnippetLength
            if (keywordLocations.Count > 1)
            {
                bool found = true;
                while (found)
                {
                    found = false;
                    for (int i = keywordLocations.Count - 1; i > 0; i--)
                        if (keywordLocations[i] - keywordLocations[i - 1] < SnippetLength / 2)
                        {
                            keywordLocations[i - 1] = (keywordLocations[i] + keywordLocations[i - 1]) / 2;
    
                            keywordLocations.RemoveAt(i);
    
                            found = true;
                        }
                }
            }
    
            //Make the snippets
            if (keywordLocations.Count > 0 && keywordLocations[0] - SnippetLength / 2 > 0)
                snippedString = "... ";
            foreach (int i in keywordLocations)
            {
                int stringStart = Math.Max(0, i - SnippetLength / 2);
                int stringEnd = Math.Min(i + SnippetLength / 2, StringToSnip.Length);
                int stringLength = Math.Min(stringEnd - stringStart, StringToSnip.Length - stringStart);
                snippedString += StringToSnip.Substring(stringStart, stringLength);
                if (stringEnd < StringToSnip.Length) snippedString += " ... ";
                if (snippedString.Length > 200) break;
            }
    
            return snippedString;
    
        }
    

    The function which will find the index of all keywords in the sample text

     private static List IndexOfAll(string haystack, string needle, StringComparison Comparison)
        {
            int pos;
            int offset = 0;
            int length = needle.Length;
            List positions = new List();
            while ((pos = haystack.IndexOf(needle, offset, Comparison)) != -1)
            {
                positions.Add(pos);
                offset = pos + length;
            }
            return positions;
        }
    

    It's a bit clumsy in its execution. The way it works is by finding the position of all keywords in the string. Then checking that no keywords are closer to each other than the desired snippet length, so that snippets won't overlap (that's where it's a bit iffy...). And then grabs substrings of the desired length centered around the position of the keywords and stitches the whole thing together.

    I know this is years late, but posting just in case it might help somebody coming across this question.

提交回复
热议问题