I have lots of strings containing text in lots of different spellings. I am tokenizing these strings by searching for keywords and if a keyword is found I use an assoicated text
I would use precompiled regular expressions for each group of keywords to match. In the background these are "compiled" to finite automata, so they are pretty fast in recognizing the pattern in your string and much faster than a Contains
for each of the possible strings.
using: System.Text.RegularExpressions
.
In your example:
new Regex(@"schw(a?\.|arz)", RegexOptions.Compiled)
Further documentation available here: http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions(v=VS.90).aspx