word-count

What is the most efficient way to count all of the words in a richtextbox?

核能气质少年 提交于 2019-12-04 17:56:41
I am writing a text editor and need to provide a live word count. Right now I am using this extension method: public static int WordCount(this string s) { s = s.TrimEnd(); if (String.IsNullOrEmpty(s)) return 0; int count = 0; bool lastWasWordChar = false; foreach (char c in s) { if (Char.IsLetterOrDigit(c) || c == '_' || c == '\'' || c == '-') { lastWasWordChar = true; continue; } if (lastWasWordChar) { lastWasWordChar = false; count++; } } if (!lastWasWordChar) count--; return count + 1; } I have it set so that the word count runs on the richtextbox's text every tenth of a second (if the

How can I count words in complex documents (.rtf, .doc, .odt, etc)?

限于喜欢 提交于 2019-12-04 09:27:59
I'm trying to write a Python function that, given the path to a document file, returns the number of words in that document. This is fairly easy to do with .txt files, and there are tools that allow me to hack support for a few more complex document formats together, but I want a really comprehensive solution. Looking at OpenOffice.org's py-uno scripting interface and list of supported formats, it would seem ideal to load the documents in a headless OOo and call its word-count function. However, I can't find any py-uno tutorials or sample code that go beyond basic document generation, and even

Correct word-count of a LaTeX document

妖精的绣舞 提交于 2019-12-03 00:45:40
问题 I'm currently searching for an application or a script that does a correct word count for a LaTeX document. Up till now, I have only encountered scripts that only work on a single file but what I want is a script that can safely ignore LaTeX keywords and also traverse linked files ...ie follow \include and \input links to produce a correct word-count for the whole document. With vim, I currently use ggVGg CTRL+G but obviously that shows the count for the current file and does not ignore LaTeX

I need a program for a wordcount [closed]

人盡茶涼 提交于 2019-12-02 23:45:33
问题 This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center. Closed 7 years ago . I need to figure out how to make a program that counts the words in a sentence that the user inputs. The user also inputs the length that each word must

How can I use the UNIX shell to count the number of times a letter appears in a text file?

故事扮演 提交于 2019-12-02 23:23:11
I have a few text files and I'd like to count how many times a letter appears in each? Specifically, I'd like to use the UNIX shell to do this, in the form of: cat file | .... do stuff... Is there a way I can get the wc command to do this? grep char -o filename | wc -l Another alternative: tr -d -C X <infile | wc -c where X is the character or string of characters you want to count and infile is the input file. Alternative to grep: sed 's/[^x]//g' filename | tr -d '\012' | wc -c where x is the character you want to count. There's also awk: $ echo -e "hello world\nbye all" | awk -Fl '{c += NF -

Job Token file not found when running Hadoop wordcount example

牧云@^-^@ 提交于 2019-12-02 17:17:55
问题 I just installed Hadoop successfully on a small cluster. Now I'm trying to run the wordcount example but I'm getting this error: ****hdfs://localhost:54310/user/myname/test11 12/04/24 13:26:45 INFO input.FileInputFormat: Total input paths to process : 1 12/04/24 13:26:45 INFO mapred.JobClient: Running job: job_201204241257_0003 12/04/24 13:26:46 INFO mapred.JobClient: map 0% reduce 0% 12/04/24 13:26:50 INFO mapred.JobClient: Task Id : attempt_201204241257_0003_m_000002_0, Status : FAILED

Correct word-count of a LaTeX document

早过忘川 提交于 2019-12-02 14:16:22
I'm currently searching for an application or a script that does a correct word count for a LaTeX document. Up till now, I have only encountered scripts that only work on a single file but what I want is a script that can safely ignore LaTeX keywords and also traverse linked files ...ie follow \include and \input links to produce a correct word-count for the whole document. With vim, I currently use ggVGg CTRL+G but obviously that shows the count for the current file and does not ignore LaTeX keywords. Does anyone know of any script (or application) that can do this job? I use texcount . The

Java MapReduce counting by date

爷,独闯天下 提交于 2019-12-02 08:28:06
I'm new to Hadoop, and i'm trying to do a MapReduce program, to count the max first two occurrencise of lecters by date (grouped by month). So my input is of this kind : 2017-06-01 , A, B, A, C, B, E, F 2017-06-02 , Q, B, Q, F, K, E, F 2017-06-03 , A, B, A, R, T, E, E 2017-07-01 , A, B, A, C, B, E, F 2017-07-05 , A, B, A, G, B, G, G so, i'm expeting as result of this MapReducer program, something like : 2017-06, A:4, E:4 2017-07, A:4, B:4 public class ArrayGiulioTest { public static Logger logger = Logger.getLogger(ArrayGiulioTest.class); public static class CustomMap extends Mapper

Extracting most frequent words out of a corpus with python

☆樱花仙子☆ 提交于 2019-12-02 05:24:34
问题 Maybe this is a stupid question, but I have a problem with extracting the ten most frequent words out of a corpus with Python. This is what I've got so far. (btw, I work with NLTK for reading a corpus with two subcategories with each 10 .txt files) import re import string from nltk.corpus import stopwords stoplist = stopwords.words('dutch') from collections import defaultdict from operator import itemgetter def toptenwords(mycorpus): words = mycorpus.words() no_capitals = set([word.lower()

Extracting most frequent words out of a corpus with python

Deadly 提交于 2019-12-02 02:00:06
Maybe this is a stupid question, but I have a problem with extracting the ten most frequent words out of a corpus with Python. This is what I've got so far. (btw, I work with NLTK for reading a corpus with two subcategories with each 10 .txt files) import re import string from nltk.corpus import stopwords stoplist = stopwords.words('dutch') from collections import defaultdict from operator import itemgetter def toptenwords(mycorpus): words = mycorpus.words() no_capitals = set([word.lower() for word in words]) filtered = [word for word in no_capitals if word not in stoplist] no_punct = [s