C fgets versus fgetc for reading line

社会主义新天地 提交于 2019-11-29 07:48:33

问题


I need to read a line of text (terminated by a newline) without making assumptions about the length. So I now face to possibilities:

  • Use fgets and check each time if the last character is a newline and continuously append to a buffer
  • Read each character using fgetc and occasionally realloc the buffer

Intuition tells me the fgetc variant might be slower, but then again I don't see how fgets can do it without examining every character (also my intuition isn't always that good). The lines are quite large so the performance is important.

I would like to know the pros and cons of each approach. Thank you in advance.


回答1:


I suggest using fgets() coupled with dynamic memory allocation - or you can investigate the interface to getline() that is in the POSIX 2008 standard and available on more recent Linux machines. That does the memory allocation stuff for you. You need to keep tabs on the buffer length as well as its address - so you might even create yourself a structure to handle the information.

Although fgetc() also works, it is marginally fiddlier - but only marginally so. Underneath the covers, it uses the same mechanisms as fgets(). The internals may be able to exploit speedier operation - analogous to strchr() - that are not available when you call fgetc() directly.




回答2:


Does your environment provide the getline(3) function? If so, I'd say go for that.

The big advantage I see is that it allocates the buffer itself (if you want), and will realloc() the buffer you pass in if it's too small. (So this means you need to pass in something gotten from malloc()).

This gets rid of some of the pain of fgets/fgetc, and you can hope that whoever wrote the C library that implements it took care of making it efficient.

Bonus: the man page on Linux has a nice example of how to use it in an efficient manner.




回答3:


If performance matters much to you, you generally want to call getc instead of fgetc. The standard tries to make it easier to implement getc as a macro to avoid function call overhead.

Past that, the main thing to deal with is probably your strategy in allocating the buffer. Most people use fixed increments (e.g., when/if we run out of space, allocate another 128 bytes). I'd advise instead using a constant factor, so if you run out of space allocate a buffer that's, say, 1 1/2 times the previous size.

Especially when getc is implemented as a macro, the difference between getc and fgets is usually quite minimal, so you're best off concentrating on other issues.




回答4:


If you can set a maximum line length, even a large one, then one fgets would do the trick. If not, multiple fgets calls will still be faster than multiple fgetc calls because the overhead of the latter will be greater.

A better answer, though, is that it's not worth worrying about the performance difference until and unless you have to. If fgetc is fast enough, what does it matter?




回答5:


I would allocate a large buffer and then use fgets, checking, reallocing and repeating if you haven't read to the end of the line.

Each time you read (either via fgetc or fgets) you are making a system call which takes time, you want to minimize the number of times that happens, so calling fgets fewer times and iterating in memory is faster.

If you are reading from a file, mmap()ing in the file is another option.



来源:https://stackoverflow.com/questions/5186457/c-fgets-versus-fgetc-for-reading-line

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!