In developing search for a site I am building, I decided to go the cheap and quick way and use Microsoft Sql Server\'s Full Text Search engine instead of something more robust l
I know this thread is way old, but I gave this a try last week and it was a pain in the back side. This is far from perfect, but this is what I came up with.
The snippet generator:
public static string SelectKeywordSnippets(string StringToSnip, string[] Keywords, int SnippetLength)
{
string snippedString = "";
List keywordLocations = new List();
//Get the locations of all keywords
for (int i = 0; i < Keywords.Count(); i++)
keywordLocations.AddRange(SharedTools.IndexOfAll(StringToSnip, Keywords[i], StringComparison.CurrentCultureIgnoreCase));
//Sort locations
keywordLocations.Sort();
//Remove locations which are closer to each other than the SnippetLength
if (keywordLocations.Count > 1)
{
bool found = true;
while (found)
{
found = false;
for (int i = keywordLocations.Count - 1; i > 0; i--)
if (keywordLocations[i] - keywordLocations[i - 1] < SnippetLength / 2)
{
keywordLocations[i - 1] = (keywordLocations[i] + keywordLocations[i - 1]) / 2;
keywordLocations.RemoveAt(i);
found = true;
}
}
}
//Make the snippets
if (keywordLocations.Count > 0 && keywordLocations[0] - SnippetLength / 2 > 0)
snippedString = "... ";
foreach (int i in keywordLocations)
{
int stringStart = Math.Max(0, i - SnippetLength / 2);
int stringEnd = Math.Min(i + SnippetLength / 2, StringToSnip.Length);
int stringLength = Math.Min(stringEnd - stringStart, StringToSnip.Length - stringStart);
snippedString += StringToSnip.Substring(stringStart, stringLength);
if (stringEnd < StringToSnip.Length) snippedString += " ... ";
if (snippedString.Length > 200) break;
}
return snippedString;
}
The function which will find the index of all keywords in the sample text
private static List IndexOfAll(string haystack, string needle, StringComparison Comparison)
{
int pos;
int offset = 0;
int length = needle.Length;
List positions = new List();
while ((pos = haystack.IndexOf(needle, offset, Comparison)) != -1)
{
positions.Add(pos);
offset = pos + length;
}
return positions;
}
It's a bit clumsy in its execution. The way it works is by finding the position of all keywords in the string. Then checking that no keywords are closer to each other than the desired snippet length, so that snippets won't overlap (that's where it's a bit iffy...). And then grabs substrings of the desired length centered around the position of the keywords and stitches the whole thing together.
I know this is years late, but posting just in case it might help somebody coming across this question.