问题
I am trying to write a program that will count the number of characters, words and lines in a text, the text is:
It was a dark and stormy night;
the rain fell in torrents - except
at occasional intervals, when it was
checked by a violent gust of wind
which swept up the streets (for it is
in London that our scene lies),
rattling along the housetops, and fiercely
agitating the scanty flame of the lamps
that struggled against the darkness.
Edward Bulwer-Lytton's novel Paul Clifford.
I keep getting 62
instead of 64
, any suggestions?
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
int main() {
int tot_chars = 0; /* total characters */
int tot_lines = 0; /* total lines */
int tot_words = 0; /* total words */
int boolean;
/* EOF == end of file */
int n;
while ((n = getchar()) != EOF) {
tot_chars++;
if (isspace(n) && !isspace(getchar())) {
tot_words++;
}
if (n == '\n') {
tot_lines++;
}
if (n == '-') {
tot_words--;
}
}
printf("Lines, Words, Characters\n");
printf(" %3d %3d %3d\n", tot_lines, tot_words, tot_chars);
// Should be 11 64 375
// rn is 11 65 375
return 0;
}
回答1:
There are multiple problems in your code:
- in the test
if (isspace(n) && !isspace(getchar()))
you potentially consume a byte from the file and fail to incrementtot_chars
, furthermore you do not incrementtot_words
if 2 words are separated by 2 white space characters. This causesdarkness.
andEdward
to be counted as a single word. you decrement
tot_words
when you see a hyphen, which is incorrect as words are separated by white space only. This causesBulwer-Lytton's
to be counted as1-1
, ie zero. Hence you only get 62 words instead of 64.on a lesser note, the name
n
is confusing for a byte read from the file. It is usually a more appropriate name for a count. The idiomatic name for a byte read from a file isc
, and the type is correct asint
to accommodate for all values ofunsigned char
plus the special valueEOF
.
To detect word boundaries, you should use a state and update the word count when the state changes:
#include <ctype.h>
#include <stdio.h>
int main(void) {
int tot_chars = 0; /* total characters */
int tot_lines = 0; /* total lines */
int tot_words = 0; /* total words */
int in_space = 1;
int c, last = '\n';
while ((c = getchar()) != EOF) {
last = c;
tot_chars++;
if (isspace(c)) {
in_space = 1;
if (c == '\n') {
tot_lines++;
}
} else {
tot_words += in_space;
in_space = 0;
}
}
if (last != '\n') {
/* count last line if not linefeed terminated */
tot_lines++;
}
printf("Lines, Words, Characters\n");
printf(" %3d %3d %3d\n", tot_lines, tot_words, tot_chars);
return 0;
}
回答2:
Actually Now i think you have to modify the program,Assuming words are separated by spaces(any other white space Character) and counting on this base will not work if your text has two or more spaces(any other white space Character) to separate a single word. Because this will be also counted as words, (when there where no actual words used)
I think your last if
block is really messy, you are using ispunct()
to decrement tot_words
but your words in text uses punctuation marks in them(without spaces),This means they are part of words. so you should not decrement them.
Previously i thought we should check only for the '-'
character in last if
block, As its used in 1st para of text with spaces, but it is also again used in Novel name without any space, so i think you should completely ignore last if
block and consider '-'
as word for simplicity of the logic.
I have modified the first if block it makes your program error proof even when two or more spaces(any other white space Character) are given to separate a word.
if (isspace(n)) // isspace() checks for whitespace characters ' ', '\t', '\n','\r, so no need to write like this (isspace(n) || n == '\n')
boolean=0; //outside of word.
else if(boolean==0){
tot_words++;
boolean=1; //inside of word.
}
if (n=='\n')
tot_lines++;
回答3:
Both of the following conditionals increment your word count on newline characters, which means that every word followed by a newline instead of a space is counted twice:
if (isspace(n) || n == '\n'){
tot_words++;
}
if (n=='\n'){
tot_lines++;
tot_words++;
}
If you get rid of the || n == '\n'
bit, you should get the correct count.
回答4:
Change
if (n=='\n'){
tot_lines++;
tot_words++;
}
to
if (n=='\n'){
tot_lines++;
}
You are already counting word at new line in
if (isspace(n) || n == '\n'){
tot_words++;
}
So effectively you are incrementing word counter one time extra then required for each line.
回答5:
I check your code and it works fine, also i got the output (total words) as it desired to be- Seems the code has been edited from its original post
Attaching the Output what I got after running the code- Output
回答6:
$ ./a.out " a b " "a b c " "a b c d"
s = a b , words_cnt= 2
s = a b c , words_cnt= 3
s = a b c d, words_cnt= 4
$ ./a.out "It was a dark and stormy night;
> the rain fell in torrents - except
......
Edward Bulwer-Lytton's novel Paul Clifford., words_cnt = 64
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
int
count_words(const char *s)
{
int i, w;
for (i = 0, w = 0; i < strlen(s); i++)
{
if (!isspace(*(s+i)))
{
w++;
while (!isspace(*(s+i)) && *(s+i) != '\0')
{
i++;
}
}
}
return w;
}
int
main(int argc, const char *argv[])
{
int i;
if (argc < 2)
{
printf("[*] Usage: %s <str1> <str2> ...\n", argv[0]);
return -1;
}
for (i = 1; i < argc; i++)
{
printf("s = %s, words_cnt= %d\n ", argv[i], count_words(argv[i]));
}
return 0;
}
来源:https://stackoverflow.com/questions/22969076/c-word-count-program