The number of characters of comments in a file (C programming)

前端 未结 3 1022
太阳男子
太阳男子 2021-01-25 11:52

I can\'t seem to get it right, tried everything, but..

int commentChars() {
char str[256], fileName[256];
FILE *fp;
int i;


do{
    long commentCount=0;
    ffl         


        
相关标签:
3条回答
  • 2021-01-25 12:30

    This basically trivial modification of your code deals with several problems in your code.

    1. You should not use feof() like that — `while (!feof(file)) is always wrong.
    2. You should not read data that is not part of the string just read.

    I've also refactored your code so that the function takes a file name, opens, counts and closes it, and reports on what it found.

    #include <stdio.h>
    #include <string.h>
    
    // Revised interface - process a given file name, reporting
    static void commentChars(char const *file)
    {
        char str[256];
        FILE *fp;
        long commentCount = 0;
    
        if (!(fp = fopen(file, "r")))
        {
            fprintf(stderr, "Error! File %s not found\n", file);
            return;
        }
    
        while (fgets(str, sizeof(str), fp) != 0)
        {
            int len = strlen(str);
            for (int i = 0; i <= len; i++)
            {
                if (str[i] == '/' && str[i + 1] == '/')
                {
                    commentCount += (strlen(str) - 2);
                    break;
                }
            }
        }
    
        fclose(fp);
    
        printf("%s: Number of characters contained in comments: %ld\n", file, commentCount);
    }
    
    int main(int argc, char **argv)
    {
        if (argc == 1)
            commentChars("/dev/stdin");
        else
        {
            for (int i = 1; i < argc; i++)
                commentChars(argv[i]);
        }
        return 0;
    }
    

    When run on the source code (ccc.c), it yields:

    ccc.c: Number of characters contained in comments: 58
    

    The comment isn't really complete (oops), but it serves to show what goes on. It counts the newline which fgets() preserves as part of the comment, though the // introducer is not counted.

    Dealing with /* comments is harder. You need to spot a slash followed by a star, and then read up to the next star slash character pair. This is probably more easily done using character by character input than line-by-line input; you will, at least, need to be able to interleave character analysis with line input.

    When you're ready for it, you can try this torture test on your program. It's what I use to check my comment stripper, SCC (which doesn't handle trigraphs — by conscious decision; if the source contains trigraphs, I have a trigraph remover which I use on the source first).

    /*
    @(#)File:            $RCSfile: scc.test,v $
    @(#)Version:         $Revision: 1.7 $
    @(#)Last changed:    $Date: 2013/09/09 14:06:33 $
    @(#)Purpose:         Test file for program SCC
    @(#)Author:          J Leffler
    */
    
    /*TABSTOP=4*/
    
    // -- C++ comment
    
    /*
    Multiline C-style comment
    #ifndef lint
    static const char sccs[] = "@(#)$Id: scc.test,v 1.7 2013/09/09 14:06:33 jleffler Exp $";
    #endif
    */
    
    /*
    Multi-line C-style comment
    with embedded /* in line %C% which should generate a warning
    if scc is run with the -w option
    Two comment starts /* embedded /* in line %C% should generate one warning
    */
    
    /* Comment */ Non-comment /* Comment Again */ Non-Comment Again /*
    Comment again on the next line */
    
    // A C++ comment with a C-style comment marker /* in the middle
    This is plain text under C++ (C99) commenting - but comment body otherwise
    // A C++ comment with a C-style comment end marker */ in the middle
    
    The following C-style comment end marker should generate a warning
    if scc is run with the -w option
    */
    Two of these */ generate */ one warning
    
    It is possible to have both warnings on a single line.
    Eg:
    */ /* /* */ */
    
    SCC has been trained to handle 'q' single quotes in most of
    the aberrant forms that can be used.  '\0', '\\', '\'', '\\
    n' (a valid variant on '\n'), because the backslash followed
    by newline is elided by the token scanning code in CPP before
    any other processing occurs.
    
    This is a legitimate equivalent to '\n' too: '\
    \n', again because the backslash/newline processing occurs early.
    
    The non-portable 'ab', '/*', '*/', '//' forms are handled OK too.
    
    The following quote should generate a warning from SCC; a
    compiler would not accept it.  '
    \n'
    
    " */ /* SCC has been trained to know about strings /* */ */"!
    "\"Double quotes embedded in strings, \\\" too\'!"
    "And \
    newlines in them"
    
    "And escaped double quotes at the end of a string\""
    
    aa '\\
    n' OK
    aa "\""
    aa "\
    \n"
    
    This is followed by C++/C99 comment number 1.
    // C++/C99 comment with \
    continuation character \
    on three source lines (this should not be seen with the -C flag)
    The C++/C99 comment number 1 has finished.
    
    This is followed by C++/C99 comment number 2.
    /\
    /\
    C++/C99 comment (this should not be seen with the -C flag)
    The C++/C99 comment number 2 has finished.
    
    This is followed by regular C comment number 1.
    /\
    *\
    Regular
    comment
    *\
    /
    The regular C comment number 1 has finished.
    
    /\
    \/ This is not a C++/C99 comment!
    
    This is followed by C++/C99 comment number 3.
    /\
    \
    \
    / But this is a C++/C99 comment!
    The C++/C99 comment number 3 has finished.
    
    /\
    \* This is not a C or C++  comment!
    
    This is followed by regular C comment number 2.
    /\
    */ This is a regular C comment *\
    but this is just a routine continuation *\
    and that was not the end either - but this is *\
    \
    /
    The regular C comment number 2 has finished.
    
    This is followed by regular C comment number 3.
    /\
    \
    \
    \
    * C comment */
    The regular C comment number 3 has finished.
    
    Note that \u1234 and \U0010FFF0 are legitimate Unicode characters
    (officially universal character names) that could appear in an
    id\u0065ntifier, a '\u0065' character constant, or in a "char\u0061cter\
     string".  Since these are mapped long after comments are eliminated,
    they cannot affect the interpretation of /* comments */.  In particular,
    none of \u0002A.  \U0000002A, \u002F and \U0000002F ever constitute part
    of a comment delimiter ('*' or '/').
    
    More double quoted string stuff:
    
        if (logtable_out)
        {
        sprintf(logtable_out,
            "insert into %s (bld_id, err_operation, err_expected, err_sql_stmt, err_sql_state)" 
            " values (\"%s\", \"%s\", \"%s\", \"", str_logtable, blade, operation, expected);
        /* watch out for embedded double quotes. */
        }
    
    
    /* Non-terminated C-style comment at the end of the file
    
    0 讨论(0)
  • 2021-01-25 12:39

    I think you best use regular expressions. They seem scary, but they're really not that bad for things like this. You can always try playing some regex golf to practice ;-)

    I'd approach it as follows:

    • Build a regular expression that captures comments
    • Scan your file for it
    • Count the characters in the match

    Using some regex code and a bit about matching comments in C, I hacked this together which should allow you to count all the bytes that are part of a block style comment /* */ - Including the delimiters. I only tested it on OS X. I suppose you can handle the rest?

    #include <regex.h>
    #include <stdio.h>
    #include <stdlib.h>
    
    #define MAX_ERROR_MSG 0x1000
    
    int compile_regex(regex_t *r, char * regex_text)
    {
        int status = regcomp (r, regex_text, REG_EXTENDED|REG_NEWLINE|REG_ENHANCED);
        if (status != 0) {
            char error_message[MAX_ERROR_MSG];
            regerror (status, r, error_message, MAX_ERROR_MSG);
            printf ("Regex error compiling '%s': %s\n",
                regex_text, error_message);
            return 1;
        }
        return 0;
    }
    int match_regex(regex_t *r, const char * to_match, long long *nbytes)
    {
        /* Pointer to end of previous match */
        const char *p = to_match;
        /* Maximum number of matches */
        size_t n_matches = 10;
        /* Array of matches */
        regmatch_t m[n_matches];
    
        while(1) {
            int i = 0;
            int nomatch = regexec (r, p, n_matches, m, 0);
            if(nomatch) {
                printf("No more matches.\n");
                return nomatch;
            }
            //Just handle first match (the entire match), don't care
            //about groups
            int start;
            int finish;
            start = m[0].rm_so + (p - to_match);
            finish = m[0].rm_eo + (p - to_match);
            *nbytes += m[0].rm_eo - m[0].rm_so;
    
            printf("match length(bytes) : %lld\n", m[0].rm_eo - m[0].rm_so);
            printf("Match: %.*s\n\n", finish - start, to_match + start);
            p += m[0].rm_eo;
        }
        return 0;
    }
    
    int main(int argc, char *argv[])
    {
        regex_t r;
        char regex_text[128] = "/\\*(.|[\r\n])*?\\*/";
        long long comment_bytes = 0;
    
        char *file_contents;
        size_t input_file_size;
        FILE *input_file;
        if(argc != 2) {
            printf("Usage : %s <filename>", argv[0]);
            return 0;
        }
        input_file = fopen(argv[1], "rb");
        fseek(input_file, 0, SEEK_END);
        input_file_size = ftell(input_file);
        rewind(input_file);
        file_contents = malloc(input_file_size * (sizeof(char)));
        fread(file_contents, sizeof(char), input_file_size, input_file);
    
        compile_regex(&r, regex_text);
        match_regex(&r, file_contents, &comment_bytes);
        regfree(&r);
        printf("Found %lld bytes in comments\n", comment_bytes);
    
        return 0;
    }
    
    0 讨论(0)
  • 2021-01-25 12:44
    #include <stdio.h>
    
    size_t counter(FILE *fp){
        int ch, chn;
        size_t count = 0;
        enum { none, in_line_comment, in_range_comment, in_string, in_char_constant } status;
    #if 0
        in_range_comment : /* this */
        in_line_comment  : //this
        in_string : "this"
        in_char_constnt : ' '
    #endif
    
        status = none;
        while(EOF!=(ch=fgetc(fp))){
            switch(status){
            case in_line_comment :
                if(ch == '\n'){
                    status = none;
                }
                ++count;
                continue;
            case in_range_comment :
                if(ch == '*'){
                    chn = fgetc(fp);
                    if(chn == '/'){
                        status  = none;
                        continue;
                    }
                    ungetc(chn, fp);
                }
                ++count;
                continue;
            case in_string :
                if(ch == '\\'){
                    chn = fgetc(fp);
                    if(chn == '"'){
                        continue;
                    }
                    ungetc(chn, fp);
                } else {
                    if(ch == '"')
                        status = none;
                }
                continue;
            case in_char_constant :
                if(ch == '\\'){
                    chn = fgetc(fp);
                    if(chn == '\''){
                        continue;
                    }
                    ungetc(chn, fp);
                } else {
                    if(ch == '\'')
                        status = none;
                }
                continue;
            case none :
                switch(ch){
                case '/':
                    if('/' == (chn = fgetc(fp))){
                        status = in_line_comment;
                        continue;
                    } else if('*' == chn){
                        status = in_range_comment;
                        continue;
                    } else
                        ungetc(chn, fp);
                    break;
                case '"':
                    status = in_string;
                    break;
                case '\'':
                    status = in_char_constant;
                    break;
                }
            }
        }
        return count;
    }
    
    int main(void){
        FILE *fp = stdin;
        size_t c = counter(fp);
        printf("%lu\n", c);
    
        return 0;
    }
    
    0 讨论(0)
提交回复
热议问题