Line counting and abberant results

人走茶凉 提交于 2019-12-13 00:18:23

问题


I'm writing a utility to count the lines in a given file via the Unix command line. Normally this would be dead simple for me, but apparently I'm having a major off night. The goal of this program is to take in an unknown number of files from the command line, read them into a buffer and check for the newline character. Sounds simple?

int size= 4096;

int main(int argc, char *argv[]){
  int fd, i, j, c, fileLines, totalLines;
  char *buf= (char *)malloc(size); //read buffer

  for (i=2; i<argc; i++){ //get first file

    fileLines=1;    

    if ((fd=open(argv[i], O_RDONLY))!= -1){ //open, read, print file count, close
        while ((c= read(fd, buf, size))!= 0){

            for (j=0; j<size; j++){
                if (buf[j] == '\n')
                    fileLines++;
            }
        }

    }
    printf("%s had %d lines of text\n", argv[i], fileLines);
    totalLines+= fileLines;
    close(fd);

  }

  printf("%d lines were counted overall\n", totalLines);    
  return 0;
}

I have two problems. The first is that the first printf statement is never executed outside of the debugger. The second thing is the totalLines printout should be roughly 175K lines, but the printed value is about 767 times larger.

I'm having trouble understanding this, because all the relevant variables have been declared out of scope from their modification, but that still doesn't explain why the first print statemeent and line counter update is ignored outside of the debugger along with the abberant totalLines result

Any help is appreciated.

ANSWER

Two changes were suggested.
The first was to change j<size to j<c. While this was not the solution required, it follows good coding convention

The second was to change i=2 to i=1. The reason I had the original start variable was the way I started the debugger executable. In the gdb command line, I entered in run lc1 f1.txt to start the debugger. This resulted in the arglist having three variables, and I didn't know that run f1.txt was perfectly suitable, since my professor introduced us to gdb by using the first example.


回答1:


Consider: ./program file.txt

argv[0] is "program"
argv[1] is "file.txt"

which means your for loop starts from the wrong index, and if you are passing only 1 file through the cmd line your code will never enter in that loop! It should start at index 1:

for (i=1; i<argc; i++){

Do yourself a favor and initialize all variables when you declare them. Is the only way to ensure that there will be no garbage on those memory locations.




回答2:


You're not initializing totalLines. You increment it inside of your loop, but you don't set it to 0 when you first declare it.

Also, why do you start from i=2? This is the third command-line argument, and the second parameter to your program. Is this what you intended, or did you want to start from the first parameter to your program?

And as others have pointed out, you should have j < c instead of j < size.




回答3:


Your loop is wrong. It should be j=0; j<c; j++. That's probably not directly responsible for the errors you're seeing but will definitely cause problems.

Did you try stepping through the code with a debugger?




回答4:


First, excellent question. :) All the necessary code, well stated, and it's obvious you've done your work. :)

How are you starting your program when in the debugger? I think the argv[2] starting point might be related to not reaching the printf(), but it would depend upon how you're starting. More details below.

A few comments:

int size= 4096;

Typically, C preprocessor macros are used for this kind of magic number. I know your teachers probably said to never use the preprocessor, but idiomatic C would read:

#define SIZE 4096
for (i=2; i<argc; i++){ //get first file

Try i=1 -- argv[0] is the name of the program, argv[1] is going to be the first command line argument -- presumably if someone calls it via ./wc foo you want to count the number of lines in the file foo. :) (Also, you want the loop to terminate. :) Of course, if you're trying to write a replacement for wc -l, then your loop is alright, but not very helpful if someone screws up the arguments. That can safely be kept as a project for later. (If you're curious now, read the getopt(3) manpage. :)

    if ((fd=open(argv[i], O_RDONLY))!= -1){
        while ((c= read(fd, buf, size))!= 0){

            for (j=0; j<size; j++){

You are ending the loop at j<size -- but you only read in c characters in the last block. You're reading left-over garbage on the last block. (I wouldn't be surprised if there are generated files in /proc/ that might return short reads out of convenience for kernel programmers.)

                if (buf[j] == '\n')
                    fileLines++;
            }
        }

    }
    printf("%s had %d lines of text\n", argv[i], fileLines);
    totalLines+= fileLines;

This is the first time you've assigned to totalLines. :) It is liable to have garbage initial value.

    close(fd);

You should probably move the close(fd); call into the if((fd=open())) block; if the open failed, this will call close(-1);. Not a big deal, but if you were checking the close(2) error return (always good practice), it'd return a needless error.

  }

Hope this helps!




回答5:


You're probably aware of wc, but I'll mention it just in case.

I know it doesn't directly help you debug your specific problem, but maybe you could glance at the source code and/or use it to verify that your program is working.




回答6:


You have logical error in for() loop. You should use "bytes read" instead "read up to", what I mean in your code use "c" instead "size" in for()



来源:https://stackoverflow.com/questions/5891838/line-counting-and-abberant-results

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!