Some people seem to think that C\'s strcpy()
function is bad or evil. While I admit that it\'s usually better to use strncpy()
in order to avoid bu
Your code is terribly inefficient because it runs through the string twice to copy it.
Once in strlen().
Then again in strcpy().
And you don't check s1 for NULL.
Storing the length in some additional variable costs you about nothing, while running through each and every string twice to copy it is a cardinal sin.
I'd tend to use memcpy
if I have already calculated the length, although strcpy
is usually optimised to work on machine words, it feels that you should provide the library with as much information as you can, so it can use the most optimal copying mechanism.
But for the example you give, it doesn't matter - if it's going to fail, it will be in the initial strlen
, so strncpy doesn't buy you anything in terms of safety (and presumbly strncpy
is slower as it has to both check bounds and for nul), and any difference between memcpy
and strcpy
isn't worth changing code for speculatively.
No one has mentioned strlcpy, developed by Todd C. Miller and Theo de Raadt. As they say in their paper:
The most common misconception is that
strncpy()
NUL-terminates the destination string. This is only true, however, if length of the source string is less than the size parameter. This can be problematic when copying user input that may be of arbitrary length into a fixed size buffer. The safest way to usestrncpy()
in this situation is to pass it one less than the size of the destination string, and then terminate the string by hand. That way you are guaranteed to always have a NUL-terminated destination string.
There are counter-arguments for the use of strlcpy
; the Wikipedia page makes note that
Drepper argues that
strlcpy
andstrlcat
make truncation errors easier for a programmer to ignore and thus can introduce more bugs than they remove.*
However, I believe that this just forces people that know what they're doing to add a manual NULL termination, in addition to a manual adjustment to the argument to strncpy
. Use of strlcpy
makes it much easier to avoid buffer overruns because you failed to NULL terminate your buffer.
Also note that the lack of strlcpy
in glibc or Microsoft's libraries should not be a barrier to use; you can find the source for strlcpy
and friends in any BSD distribution, and the license is likely friendly to your commercial/non-commercial project. See the comment at the top of strlcpy.c
.
The evil comes when people use it like this (although the below is super simplified):
void BadFunction(char *input)
{
char buffer[1024]; //surely this will **always** be enough
strcpy(buffer, input);
...
}
Which is a situation that happens suprising often.
But yeah, strcpy is as good as strncpy in any situation where you are allocating memory for the destination buffer and have already used strlen to find the length.
char* dupstr(char* str)
{
int full_len; // includes null terminator
char* ret;
char* s = str;
#ifdef _DEBUG
if (! str)
toss("arg 1 null", __WHENCE__);
#endif
full_len = strlen(s) + 1;
if (! (ret = (char*) malloc(full_len)))
toss("out of memory", __WHENCE__);
memcpy(ret, s, full_len); // already know len, so strcpy() would be slower
return ret;
}
I think strncpy is evil too.
To truly protect yourself from programming errors of this kind, you need to make it impossible to write code that (a) looks OK, and (b) overruns a buffer.
This means you need a real string abstraction, which stores the buffer and capacity opaquely, binds them together, forever, and checks bounds. Otherwise, you end up passing strings and their capacities all over the shop. Once you get to real string ops, like modifying the middle of a string, it's almost as easy to pass the wrong length into strncpy (and especially strncat), as it is to call strcpy with a too-small destination.
Of course you might still ask whether to use strncpy or strcpy in implementing that abstraction: strncpy is safer there provided you fully grok what it does. But in string-handling application code, relying on strncpy to prevent buffer overflows is like wearing half a condom.
So, your strdup-replacement might look something like this (order of definitions changed to keep you in suspense):
string *string_dup(const string *s1) {
string *s2 = string_alloc(string_len(s1));
if (s2 != NULL) {
string_set(s2,s1);
}
return s2;
}
static inline size_t string_len(const string *s) {
return strlen(s->data);
}
static inline void string_set(string *dest, const string *src) {
// potential (but unlikely) performance issue: strncpy 0-fills dest,
// even if the src is very short. We may wish to optimise
// by switching to memcpy later. But strncpy is better here than
// strcpy, because it means we can use string_set even when
// the length of src is unknown.
strncpy(dest->data, src->data, dest->capacity);
}
string *string_alloc(size_t maxlen) {
if (maxlen > SIZE_MAX - sizeof(string) - 1) return NULL;
string *self = malloc(sizeof(string) + maxlen + 1);
if (self != NULL) {
// empty string
self->data[0] = '\0';
// strncpy doesn't NUL-terminate if it prevents overflow,
// so exclude the NUL-terminator from the capacity, set it now,
// and it can never be overwritten.
self->capacity = maxlen;
self->data[maxlen] = '\0';
}
return self;
}
typedef struct string {
size_t capacity;
char data[0];
} string;
The problem with these string abstractions is that nobody can ever agree on one (for instance whether strncpy's idiosyncrasies mentioned in comments above are good or bad, whether you need immutable and/or copy-on-write strings that share buffers when you create a substring, etc). So although in theory you should just take one off the shelf, you can end up with one per project.