Word Count Algorithm in C#

前端 未结 7 1483
天涯浪人
天涯浪人 2021-01-05 02:37

I am looking for a good word count class or function. When I copy and paste something from the internet and compare it with my custom word count algorithm and MS Word it is

相关标签:
7条回答
  • 2021-01-05 03:01

    Use a regular expression to find words (e.g. [\w]+) and just count the matches

    public static Regex regex = new Regex(
      "[\\w]+",
    RegexOptions.Multiline
    | RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );
    

    regex.Match(_someString).Count

    0 讨论(0)
  • 2021-01-05 03:06

    As @astander suggests, you can do a String.Split as follows:

    string[] a = s.Split(
        new char[] { ' ', ',', ';', '.', '!', '"', '(', ')', '?' },
        StringSplitOptions.RemoveEmptyEntries);
    

    By passing in an array of chars, you can split on multiple word breaks. Removing empty entries will keep you from counting non-word words.

    0 讨论(0)
  • 2021-01-05 03:08

    Here is the stripped down version of c# code class i made for counting words , asian words , charaters etc. This is almost same as Microsoft Word. I developed the original code for counting words for Microsoft Word documents.

        using System;
        using System.Collections.Generic;
        using System.Linq;
        using System.Text;
        using System.Text.RegularExpressions;
        namespace BL {
        public class WordCount 
        {
    
        public int NonAsianWordCount { get; set; }
        public int AsianWordCount { get; set; }
        public int TextLineCount { get; set; }
        public int TotalWordCount { get; set; }
        public int CharacterCount { get; set; }
        public int CharacterCountWithSpaces { get; set; }
    
    
        //public string Text { get; set; }
    
        public WordCount(){}
    
        ~WordCount() {}
    
    
        public void GetCountWords(string s)
        {
            #region Regular Expression Collection
            string asianExpression = @"[\u3001-\uFFFF]";
            string englishExpression = @"[\S]+";
            string LineCountExpression = @"[\r]+";
            #endregion
    
    
            #region Asian Character
            MatchCollection asiancollection = Regex.Matches(s, asianExpression);
    
            AsianWordCount = asiancollection.Count; //Asian Character Count
    
            s = Regex.Replace(s, asianExpression, " ");
    
            #endregion 
    
    
            #region English Characters Count
            MatchCollection collection = Regex.Matches(s, englishExpression);
            NonAsianWordCount = collection.Count;
            #endregion
    
            #region Text Lines Count
            MatchCollection Lines = Regex.Matches(s, LineCountExpression);
            TextLineCount = Lines.Count;
            #endregion
    
            #region Total Character Count
    
            CharacterCount = AsianWordCount;
            CharacterCountWithSpaces = CharacterCount;
    
            foreach (Match word in collection)
            {
                CharacterCount += word.Value.Length ;
                CharacterCountWithSpaces += word.Value.Length + 1;
            }
    
            #endregion
    
            #region Total Character Count
            TotalWordCount = AsianWordCount + NonAsianWordCount;
            #endregion
        }
    }
    }
    
    0 讨论(0)
  • 2021-01-05 03:10

    You also need to check for newlines, tabs, and non-breaking spaces. I find it best to copy the source text into a StringBuilder and replace all newlines, tabs, and sentence ending characters with spaces. Then split the string based on spaces.

    0 讨论(0)
  • 2021-01-05 03:23

    I've just had the same problem in ClipFlair, where I needed to calculate WPM (Words-per-minute) for Movie Captions, so I came up with the following one:

    You can define this static extension method in a static class and then add a using clause to the namespace of that static class at any class that needs to use this extension method. The extension method is invoked using s.WordCount(), where s is a string (an identifier [variable/constant] or literal)

    public static int WordCount(this string s)
    {
      int last = s.Length-1;
    
      int count = 0;
      for (int i = 0; i <= last; i++)
      {
        if ( char.IsLetterOrDigit(s[i]) &&
             ((i==last) || char.IsWhiteSpace(s[i+1]) || char.IsPunctuation(s[i+1])) )
          count++;
      }
      return count;
    }
    
    0 讨论(0)
  • 2021-01-05 03:24
    public static class WordCount
    {
        public static int Count(string text)
        {
            int wordCount = 0;
            text = text.Trim();// trim white spaces
    
            if (text == ""){return 0;} // end if empty text
    
            foreach (string word in text.Split(' ')) // or use any other char(instead of empty space ' ') that you consider a word splitter 
            wordCount++;
            return wordCount;
        }
    }
    
    0 讨论(0)
提交回复
热议问题