In an interview, I was asked to write an implementation of strcpy and then fix it so that it properly handles overlapping strings. My implementation is below and it is very naiv
I was asked this in a recent interview. We don't have to 'detect' overlap. We can write strcpy
in such a way that overlapping addresses are taken care of. The key is to copy from the end of source string instead of from the start.
Here is a quick code.
void str_copy(const char *src, char *dst)
{
/* error checks */
int i = strlen(a); /* may have to account for null character */
while(i >= 0)
{
dst[i] = src[i];
i--;
}
}
EDIT: This only works when a < b. For a > b, copy from the start.
You could probably use memmove() if you expect the strings to be overlapping.
char* my_strcpy(char *a, char *b)
{
memmove(a, b, strlen(b) + 1);
return a;
}
if (a>= b && a <= b+strlen(b))) || (b+strlen(b) >= a && b+strlen(b) <= a + strlen(b))
(*) you should cache strlen(b) to improve performance
What it does:
checks if the a+len
[address of a + extra len bytes] is inside the string, or a
[address of a] is inside the string, these are the only possibilities for a string overlapping.
There is no portable way to detect this. You have to do pointer comparisons, and these are only defined within the same object. I.e. if the two strings do not overlap and are in fact different objects, then the pointer comparisons give you undefined behaviour.
I would let the standard library handle this, by using memmove(a, b, strlen(b) + 1)
.
EDIT:
As Steve Jessop pointed out in the comments, there actually is a portable but slow way to detect overlap in this case. Compare each address within b with the first and last address of a for equality. The equality comparison with ==
is always well defined.
So you have something like this:
l = strlen(b);
isoverlap = 0;
for (i = 0; i <= l; i++)
{
if ((b + i == a) || (b + i == a + l))
{
isoverlap = 1;
break;
}
}
EDIT 2: Visualization of case 2
You have something like the following array and pointers:
S t r i n g 0 _ _ _ _ _ _ _
^ ^
| |
b a
Note that b + strlen(b)
results in a pointer to the terminating \0. Start one behind, else you need extra handling of edge cases. It is valid to set the pointers there, you just can't dereference them.
src = b + strlen(b) + 1;
dst = a + strlen(b) + 1;
S t r i n g 0 _ _ _ _ _ _ _
^ ^ ^ ^
| | | |
b a src dst
Now the copy loop which copies the \0, too.
while (src > b)
{
src--; dst--;
*dst = *src;
}
The first step gives this:
src--; dst--;
S t r i n g 0 _ _ _ _ _ _ _
^ ^ ^ ^
| | | |
b a src dst
*dst = *src;
S t r i n g 0 _ _ _ 0 _ _ _
^ ^ ^ ^
| | | |
b a src dst
And so on, until src
ends up equal to b
:
S t r i S t r i n g 0 _ _ _
^ ^
| |
b a
src dst
If you want it a bit more hackish, you could compress it further, but I don't recommend this:
while (src > b)
*(--dst) = *(--src);
If these two strings overlap, then, while copying you'll run over the original a
or b
pointers.
Assuming that strcpy( a, b ) roughly means a <- b, i.e., the first parameter is the destination of the copy, then you only check whether the copy pointer reaches b
's position.
You only need to save the b
original position, and while copying, check you haven't reached it. Also, don't write the trailing zero if you have reached that position.
char* my_strcpy(char *a, const char *b)
{
if ( a == NULL
|| b == NULL )
{
return NULL;
}
char *n = a;
const char * oldB = b;
while( *b != '\0'
&& a != oldB )
{
*a = *b;
a++;
b++;
}
if ( a != oldB ) {
*a = '\0';
}
return n;
}
This algorithm just stops copying. Maybe you want to do something else, such as marking the error condition, or add an end-of-the string mark to the previous position (though failing silently (as the algorithm does at the moment) isn't the best option).
Hope this helps.
Even without using relational pointer comparisons, memmove
, or equivalent, it is possible to code a version of strcpy
which will be performed as an strlen
and memcpy
in the non-overlapping case, and as a top-down copy in the overlapping case. The key is to exploit the fact that if the first byte of the destination is read and then replaced with zero, calling strlen
on the source and adding to the source pointer the value that was returned will yield a legitimate pointer which will equal the start of the destination in the "troublesome overlap" case. If the source and destination are different objects, the "source plus strlen" pointer may be safely computed and observed to be unequal to the destination.
In the event that adding the string length to the source pointer yields the destination pointer, replacing the zero byte with the earlier-read value and calling strlen on the destination will allow code to determine the ending address of the source and destination strings. Further, the length of the source string will indicate the distance between the pointers. If this value is large (probably greater than 16 or so), code may efficiently subdivide the "move" operation into a top-down sequence of memcpy operations. Otherwise the string may be copied with a top-down loop of single-byte copy operations, or with a sequence of "memcpy to source to buffer"/"memcpy buffer to destination" operations [if the per-byte cost of a large memcpy is less than half that of an individual-character-copy loop, using a ~256-byte buffer may be a useful optimization].