I have two questions.
Do realloc()
and memcpy()
copy the entries in an array to another in a way faster than just iterating on eac
The x86 has special instructions for scanning and matching a byte/word in a block of memory as well and one that can be used to copy a block of memory (it is a CISC cpu after all). A lot of C compilers that implement inline assembly language and a pragma to do inlining of entire functions have for many many years taken advantage of this in their library functions.
The ones used for mem copy are movsb/movsw in combination to the rep instruction.
CMPS/MOVS/SCAS/STOS
REP, REPE, REPNE, REPNZ, REPZ
Setup registers with src/trg addresses and int count and away you go.
As other said, it won't be faster than O(n), but memory systems often have a preferred block size, and also it's possible to, say, write the size of a cache line at a time.
1 - No. They copy a block at a time. See http://www.embedded.com/design/configurable-systems/4024961/Optimizing-Memcpy-improves-speed for a pretty good analysis.
2 - This is implementation dependent. See http://www.gnu.org/software/libtool/manual/libc/Changing-Block-Size.html for glibc details. "In several allocation implementations, making a block smaller sometimes necessitates copying it"
Let's take a little closer look at memcpy
and, while we're at it, at "big O" or Landau notation.
First, big-O. As i've talked about elsewhere, it's worth remembering the definition of big O, which is that some function g(n) is said to be O(f(n)) when there exists a constant k for which g(n) ≤ kf(n). What the constant does is lets you ignore the little details in favor of the important part. As everyone has noted, memcpy
of n bytes will be O(n) in most any normal architecture, because no matter what you have to move those n bytes, one chunk at a time. So, a first, naive implementation of memcpy
in C could be written
unsigned char *
memcpy(unsigned char * s1, unsigned char * s2, long size){
long ix;
for(ix=0; ix < size; ix++)
s1[ix] = s2[ix];
return s1;
}
This is in fact O(n), and might make you wonder why we even bother with a library routine. however, the thing about the libc functions is that they are the place where platform-specific utilities get written; if you want to optimize for the architecture, this is one of the places you can do it. So, depending on the architecture, there may be a more efficient implementation options; for example, in the IBM 360 archiecture, there is a MOVL
instruction that moves data is big chunks using very highly optimized microcode. So in place of that loop, a 360 implementation of memcpy might instead look something like
LR 3,S1 LOAD S1 ADDR in Register 3
LR 4,S2
MOVL 3,4,SIZE
(No guarantees that's exactly right 360 code by the way, but it'll serve for an illustration.) This implementation looks like instead of doing n steps around the loop as the C code did, it just executes 3 instructions.
What really happens, though, is that it's executing O(n) micro instructions under the covers. What's different between the two is the constant k; because the microcode is much faster, and because there's only three decode steps on the instructions, it is dramatically faster than the naive version, but it's still O(n) -- it's just the constant is smaller.
And that's why you can make good use of memcpy
-- it's not asymptotically faster, but the implementation is as fast as someone could make it on that particular architecture.
Some of the important points related to realloc(check on dev c++) : void *realloc(void *ptr, size_t size);
The realloc() function shall change the size of the memory object pointed to by ptr to the size specified by size.
The contents of the object shall remain unchanged up to the lesser of the new and old sizes.
If the new size is larger, the contents of the newly allocated portion of the object are unspecified.
If size is 0 and ptr is not a null pointer, the object pointed to is freed.
If ptr is a null pointer, realloc() shall be equivalent to malloc() for the specified size.
If ptr does not match a pointer returned earlier by calloc(), malloc(), or realloc() or if the space has previously been deallocated by a call to free() or realloc(), the behavior is undefined.
The performance of memcpy
can't really be better than O(N) but it can be optimized so that it outperforms manual copying; for example, it might be able to copy 4 bytes in the time it takes you to copy 1 byte. Many memcpy
implementations are written in assembly using optimized instructions that can copy multiple elements at a time which is usually faster than copying data one byte at a time.
I don't quite understand this question, if you use realloc
to decrease the size of memory and it succeeds (returns non-NULL), the new location will contain the same data as the old location up to the size of the new request. If the memory location was changed as a result of calling realloc
(not usual when decreasing the size) the contents will be copied, otherwise no copying needs to happen as the memory hasn't moved.