问题
I need to read a line of text (terminated by a newline) without making assumptions about the length. So I now face to possibilities:
- Use
fgets
and check each time if the last character is a newline and continuously append to a buffer - Read each character using
fgetc
and occasionallyrealloc
the buffer
Intuition tells me the fgetc
variant might be slower, but then again I don't see how fgets
can do it without examining every character (also my intuition isn't always that good). The lines are quite large so the performance is important.
I would like to know the pros and cons of each approach. Thank you in advance.
回答1:
I suggest using fgets()
coupled with dynamic memory allocation - or you can investigate the interface to getline() that is in the POSIX 2008 standard and available on more recent Linux machines. That does the memory allocation stuff for you. You need to keep tabs on the buffer length as well as its address - so you might even create yourself a structure to handle the information.
Although fgetc()
also works, it is marginally fiddlier - but only marginally so. Underneath the covers, it uses the same mechanisms as fgets()
. The internals may be able to exploit speedier operation - analogous to strchr()
- that are not available when you call fgetc()
directly.
回答2:
Does your environment provide the getline(3)
function? If so, I'd say go for that.
The big advantage I see is that it allocates the buffer itself (if you want), and will realloc()
the buffer you pass in if it's too small. (So this means you need to pass in something gotten from malloc()
).
This gets rid of some of the pain of fgets/fgetc, and you can hope that whoever wrote the C library that implements it took care of making it efficient.
Bonus: the man page on Linux has a nice example of how to use it in an efficient manner.
回答3:
If performance matters much to you, you generally want to call getc
instead of fgetc
. The standard tries to make it easier to implement getc
as a macro to avoid function call overhead.
Past that, the main thing to deal with is probably your strategy in allocating the buffer. Most people use fixed increments (e.g., when/if we run out of space, allocate another 128 bytes). I'd advise instead using a constant factor, so if you run out of space allocate a buffer that's, say, 1 1/2 times the previous size.
Especially when getc
is implemented as a macro, the difference between getc
and fgets
is usually quite minimal, so you're best off concentrating on other issues.
回答4:
If you can set a maximum line length, even a large one, then one fgets
would do the trick. If not, multiple fgets
calls will still be faster than multiple fgetc
calls because the overhead of the latter will be greater.
A better answer, though, is that it's not worth worrying about the performance difference until and unless you have to. If fgetc
is fast enough, what does it matter?
回答5:
I would allocate a large buffer and then use fgets, checking, reallocing and repeating if you haven't read to the end of the line.
Each time you read (either via fgetc or fgets) you are making a system call which takes time, you want to minimize the number of times that happens, so calling fgets fewer times and iterating in memory is faster.
If you are reading from a file, mmap()
ing in the file is another option.
来源:https://stackoverflow.com/questions/5186457/c-fgets-versus-fgetc-for-reading-line