Efficient algorithm for finding all keywords in a text

后端 未结 5 1163
傲寒
傲寒 2021-02-08 21:08

I have lots of strings containing text in lots of different spellings. I am tokenizing these strings by searching for keywords and if a keyword is found I use an assoicated text

5条回答
  •  旧巷少年郎
    2021-02-08 21:44

    I would use precompiled regular expressions for each group of keywords to match. In the background these are "compiled" to finite automata, so they are pretty fast in recognizing the pattern in your string and much faster than a Contains for each of the possible strings.

    using: System.Text.RegularExpressions.

    In your example:

    • "schw.", "schwa." and "schwarz"
    • new Regex(@"schw(a?\.|arz)", RegexOptions.Compiled)

    Further documentation available here: http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions(v=VS.90).aspx

提交回复
热议问题