Which functions in the C standard library commonly encourage bad practice? [closed]

不打扰是莪最后的温柔 提交于 2019-11-28 02:50:57
mizo

A common pitfall with the strtok() function is to assume that the parsed string is left unchanged, while it actually replaces the separator character with '\0'.

Also, strtok() is used by making subsequent calls to it, until the entire string is tokenized. Some library implementations store strtok()'s internal status in a global variable, which may induce some nasty suprises, if strtok() is called from multiple threads at the same time.

The CERT C Secure Coding Standard lists many of these pitfalls you asked about.

What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies ?

I'm gonna go with the obvious :

char *gets(char *s);

With its remarkable particularity that it's simply impossible to use it appropriately.

In almost all cases, atoi() should not be used (this also applies to atof(), atol() and atoll()).

This is because these functions do not detect out-of-range errors at all - the standard simply says "If the value of the result cannot be represented, the behavior is undefined.". So the only time they can be safely used is if you can prove that the input will certainly be within range (for example, if you pass a string of length 4 or less to atoi(), it cannot be out of range).

Instead, use one of the strtol() family of functions.

Let us extend the question to interfaces in a broader sense.

errno:

technically it is not even clear what it is, a variable, a macro, an implicit function call? In practice on modern systems it is mostly a macro that transforms into a function call to have a thread specific error state. It is evil:

  • because it may cause overhead for the caller to access the value, to check the "error" (which might just be an exceptional event)
  • because it even imposes at some places that the caller clears this "variable" before making a library call
  • because it implements a simple error return by setting a global state, of the library.

The forthcoming standard gets the definition of errno a bit more straight, but these uglinesses remain

There is often a strtok_r.

For realloc, if you need to use the old pointer, it's not that hard to use another variable. If your program fails with an allocation error, then cleaning up the old pointer is often not really necessary.

I would put printf and scanf pretty high up on this list. The fact that you have to get the formatting specifiers exactly correct makes these functions tricky to use and extremely easy to get wrong. It's also very hard to avoid buffer overruns when reading data out. Moreover, the "printf format string vulnerability" has probably caused countless security holes when well-intentioned programmers specify client-specified strings as the first argument to printf, only to find the stack smashed and security compromised many years down the line.

j_random_hacker

Any of the functions that manipulate global state, like gmtime() or localtime(). These functions simply can't be used safely in multiple threads.

EDIT: rand() is in the same category it would seem. At least there are no guarantees of thread-safety, and on my Linux system the man page warns that it is non-reentrant and non-threadsafe.

Jonathan Leffler

One of my bêtes noire is strtok(), because it is non-reentrant and because it hacks the string it is processing into pieces, inserting NUL at the end of each token it isolates. The problems with this are legion; it is distressingly often touted as a solution to a problem, but is as often a problem itself. Not always - it can be used safely. But only if you are careful. The same is true of most functions, with the notable exception of gets() which cannot be used safely.

There's already one answer about realloc, but I have a different take on it. A lot of time, I've seen people write realloc when they mean free; malloc - in other words, when they have a buffer full of trash that needs to change size before storing new data. This of course leads to potentially-large, cache-thrashing memcpy of trash that's about to be overwritten.

If used correctly with growing data (in a way that avoids worst-case O(n^2) performance for growing an object to size n, i.e. growing the buffer geometrically instead of linearly when you run out of space), realloc has doubtful benefit over simply doing your own new malloc, memcpy, and free cycle. The only way realloc can ever avoid doing this internally is when you're working with a single object at the top of the heap.

If you like to zero-fill new objects with calloc, it's easy to forget that realloc won't zero-fill the new part.

And finally, one more common use of realloc is to allocate more than you need, then resize the allocated object down to just the required size. But this can actually be harmful (additional allocation and memcpy) on implementations that strictly segregate chunks by size, and in other cases might increase fragmentation (by splitting off part of a large free chunk to store a new small object, instead of using an existing small free chunk).

I'm not sure if I'd say realloc encourages bad practice, but it's a function I'd watch out for.

How about the malloc family in general? The vast majority of large, long-lived programs I've seen use dynamic memory allocation all over the place as if it were free. Of course real-time developers know this is a myth, and careless use of dynamic allocation can lead to catastrophic blow-up of memory usage and/or fragmentation of address space to the point of memory exhaustion.

In some higher-level languages without machine-level pointers, dynamic allocation is not so bad because the implementation can move objects and defragment memory during the program's lifetime, as long as it can keep references to these objects up-to-date. A non-conventional C implementation could do this too, but working out the details is non-trivial and it would incur a very significant cost in all pointer dereferences and make pointers rather large, so for practical purposes, it's not possible in C.

My suspicion is that the correct solution is usually for long-lived programs to perform their small routine allocations as usual with malloc, but to keep large, long-lived data structures in a form where they can be reconstructed and replaced periodically to fight fragmentation, or as large malloc blocks containing a number of structures that make up a single large unit of data in the application (like a whole web page presentation in a browser), or on-disk with a fixed-size in-memory cache or memory-mapped files.

On a wholly different tack, I've never really understood the benefits of atan() when there is atan2(). The difference is that atan2() takes two arguments, and returns an angle anywhere in the range -π..+π. Further, it avoids divide by zero errors and loss of precision errors (dividing a very small number by a very large number, or vice versa). By contrast, the atan() function only returns a value in the range -π/2..+π/2, and you have to do the division beforehand (I don't recall a scenario where atan() could be used without there being a division, short of simply generating a table of arctangents). Providing 1.0 as the divisor for atan2() when given a simple value is not pushing the limits.

Another answer, since these are not really related, rand:

  • it is of unspecified random quality
  • it is not re-entrant

Some of this functions are modifying some global state. (In windows) this state is shared per single thread - you can get unexpected result. For example, the first call of rand in every thread will give the same result, and it requires some care to make it pseudorandom, but deterministic (for debug purposes).

arsenm

basename() and dirname() aren't threadsafe.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!