Which functions in the C standard library commonly encourage bad practice? [closed]

This is inspired by this question and the comments on one particular answer in that I learnt that strncpy is not a very safe string handling function in C and that it pads zeros, until it reaches n, something I was unaware of.

Specifically, to quote R..

strncpy does not null-terminate, and does null-pad the whole remainder of the destination buffer, which is a huge waste of time. You can work around the former by adding your own null padding, but not the latter. It was never intended for use as a "safe string handling" function, but for working with fixed-size fields in Unix directory tables and database files. snprintf(dest, n, "%s", src) is the only correct "safe strcpy" in standard C, but it's likely to be a lot slower. By the way, truncation in itself can be a major bug and in some cases might lead to privilege elevation or DoS, so throwing "safe" string functions that truncate their output at a problem is not a way to make it "safe" or "secure". Instead, you should ensure that the destination buffer is the right size and simply use strcpy (or better yet, memcpy if you already know the source string length).

And from Jonathan Leffler

Note that strncat() is even more confusing in its interface than strncpy() - what exactly is that length argument, again? It isn't what you'd expect based on what you supply strncpy() etc - so it is more error prone even than strncpy(). For copying strings around, I'm increasingly of the opinion that there is a strong argument that you only need memmove() because you always know all the sizes ahead of time and make sure there's enough space ahead of time. Use memmove() in preference to any of strcpy(), strcat(), strncpy(), strncat(), memcpy().

So, I'm clearly a little rusty on the C standard library. Therefore, I'd like to pose the question:

What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies?

In the interests of objectivity, I have a number of criteria for an answer:

Please, if you can, cite design reasons behind the function in question i.e. its intended purpose.
Please highlight the misuse to which the code is currently put.
Please state why that misuse may lead towards a problem. I know that should be obvious but it prevents soft answers.

Please avoid:

Debates over naming conventions of functions (except where this unequivocably causes confusion).
"I prefer x over y" - preference is ok, we all have them but I'm interested in actual unexpected side effects and how to guard against them.

As this is likely to be considered subjective and has no definite answer I'm flagging for community wiki straight away.

I am also working as per C99.

mizo

A common pitfall with the strtok() function is to assume that the parsed string is left unchanged, while it actually replaces the separator character with '\0'.

Also, strtok() is used by making subsequent calls to it, until the entire string is tokenized. Some library implementations store strtok()'s internal status in a global variable, which may induce some nasty suprises, if strtok() is called from multiple threads at the same time.

The CERT C Secure Coding Standard lists many of these pitfalls you asked about.

What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies ?

I'm gonna go with the obvious :

char *gets(char *s);

With its remarkable particularity that it's simply impossible to use it appropriately.

In almost all cases, atoi() should not be used (this also applies to atof(), atol() and atoll()).

This is because these functions do not detect out-of-range errors at all - the standard simply says "If the value of the result cannot be represented, the behavior is undefined.". So the only time they can be safely used is if you can prove that the input will certainly be within range (for example, if you pass a string of length 4 or less to atoi(), it cannot be out of range).

Instead, use one of the strtol() family of functions.

Let us extend the question to interfaces in a broader sense.

errno:

technically it is not even clear what it is, a variable, a macro, an implicit function call? In practice on modern systems it is mostly a macro that transforms into a function call to have a thread specific error state. It is evil:

because it may cause overhead for the caller to access the value, to check the "error" (which might just be an exceptional event)
because it even imposes at some places that the caller clears this "variable" before making a library call
because it implements a simple error return by setting a global state, of the library.

The forthcoming standard gets the definition of errno a bit more straight, but these uglinesses remain

There is often a strtok_r.

For realloc, if you need to use the old pointer, it's not that hard to use another variable. If your program fails with an allocation error, then cleaning up the old pointer is often not really necessary.

I would put printf and scanf pretty high up on this list. The fact that you have to get the formatting specifiers exactly correct makes these functions tricky to use and extremely easy to get wrong. It's also very hard to avoid buffer overruns when reading data out. Moreover, the "printf format string vulnerability" has probably caused countless security holes when well-intentioned programmers specify client-specified strings as the first argument to printf, only to find the stack smashed and security compromised many years down the line.

j_random_hacker

Any of the functions that manipulate global state, like gmtime() or localtime(). These functions simply can't be used safely in multiple threads.

EDIT: rand() is in the same category it would seem. At least there are no guarantees of thread-safety, and on my Linux system the man page warns that it is non-reentrant and non-threadsafe.

Jonathan Leffler

One of my bêtes noire is strtok(), because it is non-reentrant and because it hacks the string it is processing into pieces, inserting NUL at the end of each token it isolates. The problems with this are legion; it is distressingly often touted as a solution to a problem, but is as often a problem itself. Not always - it can be used safely. But only if you are careful. The same is true of most functions, with the notable exception of gets() which cannot be used safely.

There's already one answer about realloc, but I have a different take on it. A lot of time, I've seen people write realloc when they mean free; malloc - in other words, when they have a buffer full of trash that needs to change size before storing new data. This of course leads to potentially-large, cache-thrashing memcpy of trash that's about to be overwritten.

If used correctly with growing data (in a way that avoids worst-case O(n^2) performance for growing an object to size n, i.e. growing the buffer geometrically instead of linearly when you run out of space), realloc has doubtful benefit over simply doing your own new malloc, memcpy, and free cycle. The only way realloc can ever avoid doing this internally is when you're working with a single object at the top of the heap.

If you like to zero-fill new objects with calloc, it's easy to forget that realloc won't zero-fill the new part.

And finally, one more common use of realloc is to allocate more than you need, then resize the allocated object down to just the required size. But this can actually be harmful (additional allocation and memcpy) on implementations that strictly segregate chunks by size, and in other cases might increase fragmentation (by splitting off part of a large free chunk to store a new small object, instead of using an existing small free chunk).

I'm not sure if I'd say realloc encourages bad practice, but it's a function I'd watch out for.

How about the malloc family in general? The vast majority of large, long-lived programs I've seen use dynamic memory allocation all over the place as if it were free. Of course real-time developers know this is a myth, and careless use of dynamic allocation can lead to catastrophic blow-up of memory usage and/or fragmentation of address space to the point of memory exhaustion.

In some higher-level languages without machine-level pointers, dynamic allocation is not so bad because the implementation can move objects and defragment memory during the program's lifetime, as long as it can keep references to these objects up-to-date. A non-conventional C implementation could do this too, but working out the details is non-trivial and it would incur a very significant cost in all pointer dereferences and make pointers rather large, so for practical purposes, it's not possible in C.

My suspicion is that the correct solution is usually for long-lived programs to perform their small routine allocations as usual with malloc, but to keep large, long-lived data structures in a form where they can be reconstructed and replaced periodically to fight fragmentation, or as large malloc blocks containing a number of structures that make up a single large unit of data in the application (like a whole web page presentation in a browser), or on-disk with a fixed-size in-memory cache or memory-mapped files.

On a wholly different tack, I've never really understood the benefits of atan() when there is atan2(). The difference is that atan2() takes two arguments, and returns an angle anywhere in the range -π..+π. Further, it avoids divide by zero errors and loss of precision errors (dividing a very small number by a very large number, or vice versa). By contrast, the atan() function only returns a value in the range -π/2..+π/2, and you have to do the division beforehand (I don't recall a scenario where atan() could be used without there being a division, short of simply generating a table of arctangents). Providing 1.0 as the divisor for atan2() when given a simple value is not pushing the limits.

Another answer, since these are not really related, rand:

it is of unspecified random quality
it is not re-entrant

Some of this functions are modifying some global state. (In windows) this state is shared per single thread - you can get unexpected result. For example, the first call of rand in every thread will give the same result, and it requires some care to make it pseudorandom, but deterministic (for debug purposes).

arsenm

basename() and dirname() aren't threadsafe.

来源：https://stackoverflow.com/questions/4588581/which-functions-in-the-c-standard-library-commonly-encourage-bad-practice

标签

security

c99

standard-library