C Word Count program

点点圈 提交于 2019-12-13 16:40:47

问题


I am trying to write a program that will count the number of characters, words and lines in a text, the text is:

It was a dark and stormy night;
the rain fell in torrents - except
at occasional intervals, when it was
checked by a violent gust of wind
which swept up the streets (for it is
in London that our scene lies),
rattling along the housetops, and fiercely
agitating the scanty flame of the lamps
that struggled against the darkness.

  Edward Bulwer-Lytton's novel Paul Clifford.

I keep getting 62 instead of 64, any suggestions?

#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>

int main() {
    int tot_chars = 0;     /* total characters */
    int tot_lines = 0;     /* total lines */
    int tot_words = 0;     /* total words */
    int boolean;
    /* EOF == end of file */
    int n;
    while ((n = getchar()) != EOF) {
        tot_chars++;
        if (isspace(n) && !isspace(getchar())) {
            tot_words++;
        }
        if (n == '\n') {
            tot_lines++;
        }
        if (n == '-') {
            tot_words--;
        }
    }
    printf("Lines, Words, Characters\n");
    printf(" %3d %3d %3d\n", tot_lines, tot_words, tot_chars);

    // Should be 11 64 375
    // rn     is 11 65 375
    return 0;
}

回答1:


There are multiple problems in your code:

  • in the test if (isspace(n) && !isspace(getchar())) you potentially consume a byte from the file and fail to increment tot_chars, furthermore you do not increment tot_words if 2 words are separated by 2 white space characters. This causes darkness. and Edward to be counted as a single word.
  • you decrement tot_words when you see a hyphen, which is incorrect as words are separated by white space only. This causes Bulwer-Lytton's to be counted as 1-1, ie zero. Hence you only get 62 words instead of 64.

  • on a lesser note, the name n is confusing for a byte read from the file. It is usually a more appropriate name for a count. The idiomatic name for a byte read from a file is c, and the type is correct as int to accommodate for all values of unsigned char plus the special value EOF.

To detect word boundaries, you should use a state and update the word count when the state changes:

#include <ctype.h>
#include <stdio.h>

int main(void) {
    int tot_chars = 0;     /* total characters */
    int tot_lines = 0;     /* total lines */
    int tot_words = 0;     /* total words */
    int in_space = 1;
    int c, last = '\n';

    while ((c = getchar()) != EOF) {
        last = c;
        tot_chars++;
        if (isspace(c)) {
            in_space = 1;
            if (c == '\n') {
                tot_lines++;
            }
        } else {
            tot_words += in_space;
            in_space = 0;
        }
    }
    if (last != '\n') {
        /* count last line if not linefeed terminated */
        tot_lines++;
    }

    printf("Lines, Words, Characters\n");
    printf(" %3d %3d %3d\n", tot_lines, tot_words, tot_chars);

    return 0;
}



回答2:


Actually Now i think you have to modify the program,Assuming words are separated by spaces(any other white space Character) and counting on this base will not work if your text has two or more spaces(any other white space Character) to separate a single word. Because this will be also counted as words, (when there where no actual words used)

I think your last if block is really messy, you are using ispunct() to decrement tot_words but your words in text uses punctuation marks in them(without spaces),This means they are part of words. so you should not decrement them.

Previously i thought we should check only for the '-' character in last if block, As its used in 1st para of text with spaces, but it is also again used in Novel name without any space, so i think you should completely ignore last ifblock and consider '-' as word for simplicity of the logic.

I have modified the first if block it makes your program error proof even when two or more spaces(any other white space Character) are given to separate a word.

if (isspace(n))  // isspace() checks for whitespace characters '  ', '\t', '\n','\r, so no need to write like this (isspace(n) || n == '\n')
    boolean=0; //outside of word.     
else if(boolean==0){
    tot_words++;
    boolean=1; //inside of word.
 }

 if (n=='\n')
         tot_lines++;



回答3:


Both of the following conditionals increment your word count on newline characters, which means that every word followed by a newline instead of a space is counted twice:

if (isspace(n) || n == '\n'){
     tot_words++;
}
if (n=='\n'){
     tot_lines++;
     tot_words++;
}

If you get rid of the || n == '\n' bit, you should get the correct count.




回答4:


Change

        if (n=='\n'){
                tot_lines++;
                tot_words++;
        }

to

  if (n=='\n'){
                tot_lines++;
        }

You are already counting word at new line in

            if (isspace(n) || n == '\n'){
                    tot_words++;
            }

So effectively you are incrementing word counter one time extra then required for each line.




回答5:


I check your code and it works fine, also i got the output (total words) as it desired to be- Seems the code has been edited from its original post

Attaching the Output what I got after running the code- Output




回答6:


$ ./a.out " a b " "a b c " "a b c d"
s =  a b , words_cnt= 2
 s = a b c , words_cnt= 3
 s = a b c d, words_cnt= 4

$ ./a.out "It was a dark and stormy night;
> the rain fell in torrents - except
......
  Edward Bulwer-Lytton's novel Paul Clifford., words_cnt = 64

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>


int
count_words(const char *s)
{
    int i, w;

    for (i = 0, w = 0; i < strlen(s); i++)
    {
        if (!isspace(*(s+i)))
        {
            w++;
            while (!isspace(*(s+i)) && *(s+i) != '\0')
            {
                i++;
            }
        }
    }

    return w;
}

int
main(int argc, const char *argv[])
{
    int i;

    if (argc < 2)
    {
        printf("[*] Usage: %s <str1> <str2> ...\n", argv[0]);
        return -1;
    }

    for (i = 1; i < argc; i++)
    {
        printf("s = %s, words_cnt= %d\n ", argv[i], count_words(argv[i]));
    }

    return 0;
}


来源:https://stackoverflow.com/questions/22969076/c-word-count-program

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!