When making automatically expanding arrays (like C++\'s std::vector) in C, it is often common (or at least common advice) to double the size of the array each time it is fil
Compared to almost every other type of operation, malloc
, calloc
, and especially realloc
are very memory expensive. I've personally benchmarked 10,000,000 reallocs, and it takes a HUGE amount of time to do that.
Even though I had other operations going on at the same time (in both benchmark tests), I found that I could literally cut HOURS off of the run time by using max_size *= 2
instead of max_size += 1
.
You have to step back from your code for a minute and thing abstractly. What is the cost of growing a dynamic container? Programmers and researchers don't think in terms of "this took 2ms", but rather in terms of asymptotic complexity: What is the cost of growing by one element given that I already have n
elements; how does this change as n
increases?
If you only ever grew by a constant (or bounded) amount, then you would periodically have to move all the data, and so the cost of growing would depend on, and grow with, the size of the container. By contrast, when you grow the container geometrically, i.e. multiply its size by a fixed factor, every time it is full, then the expected cost of inserting is actually independent of the number of elements, i.e. constant.
It is of course not always constant, but it's amortized constant, meaning that if you keep inserting elements, then the average cost per element is constant. Every now and then you have to grow and move, but those events get rarer and rarer as you insert more and more elements.
I once asked whether it makes sense for C++ allocators to be able to grow, in the way that realloc
does. The answers I got indicated that the non-moving growing behaviour of realloc
is actually a bit of a red herring when you think asymptotically. Eventually you won't be able to grow anymore, and you'll have to move, and so for the sake of studying the asymptotic cost, it's actually irrelevant whether realloc
can sometimes be a no-op or not. (Moreover, non-moving growth seems to upset moder, arena-based allocators, which expect all their allocations to be of a similar size.)
Q: 'doubling the capacity of a dynamic array necessary"
A: No. One could grow only to the extent needed. But then you may truly copy data many times. It is a classic trade off between memory and processor time. A good growth algorithm takes into account what is known about the program's data needs and also not to over-think those needs. An exponential growth of 2x is a happy compromise.
But now to your claim "following code only does 1 copy".
The amount of copying with advanced memory allocators may not be what OP thinks. Getting the same address does not mean that the underlying memory mapping did not perform significant work. All sorts of activity go on under-the-hood.
For memory allocations that grow & shrink a lot over the life of the code, I like grow and shrink thresholds geometrically placed apart from each other.
const size_t Grow[] = {1, 4, 16, 64, 256, 1024, 4096, ... };
const size_t Shrink[] = {0, 2, 8, 32, 128, 512, 2048, ... };
By using the grow thresholds while getting larger and shrink one while contracting, one avoid thrashing near a boundary. Sometimes a factor of 1.5 is used instead.