What is the most efficient way to count all of the words in a richtextbox?

核能气质少年 提交于 2019-12-04 17:56:41

You could do a simpler word count based on white-space:

public static int WordCount(this string s)
{
  return s.Split(new char[] {' '}, 
    StringSplitOptions.RemoveEmptyEntries).Length;
}

MSDN provides this example, should give you an accurate word count much faster on large files.

You could also use a very simple Regex that looks for at least one word character and/or apostrophe to capture the contractions:

public static int WordCount(this string s) 
{
    return Regex.Matches(s, @"[\w']+").Count;
}

This will return 2141 matches (which is actually more correct than Word in this case because Word counts the single asterisk as a word in the sentence "by stabbing a * with her finger").

Your method is actually faster than the proposed String.Split method, nearly three times faster on x86 and more than two times faster on x64 in fact. I suspect JIT is messing with your timings, always run your microbenchmarks twice as JIT will occupy the vast majority of the time during your first run. And because String.Split has been NGEN'd, it doesn't need to be compiled to native code and thus will appear to be faster.

Not to mention it's also more accurate, String.Split will count 7 words here:

test :     : this is a test

It also makes sense, String.Split doesn't perform any magic and I would be very surprised if the creation of an array of many strings would be faster than simply iterating over the individual characters in the string. Foreaching over a string has apparently been highly optimized as I tried unsafe pointer arithmetic and it was actually a tiny bit slower than a simple foreach. I really doubt there's any way to do this faster, other than being smart about which sections in your text need word counts.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!