How does adding and removing elements \"rescale\" the data? How is the size of the vector calculated (I believe it is kept track of)? Any other additional resources to le
In terms of sizing, there are two values of interest for a std::vector
: size
, and capacity
(accessed via .size()
and .capacity()
).
.size()
is the number of elements that are contained in the vector, whereas .capacity()
is the number of elements that can be added to the vector, before memory will be re-allocated.
If you .push_back()
an element, size will increase by one, up until you hit the capacity. Once the capacity is reached, most (all?) implementations, re-allocate memory, doubling the capacity.
You can reserve a capacity using .reserve()
. For example:
std::vector<int> A;
A.reserve(1); // A: size:0, capacity:1 {[],x}
A.push_back(0); // A: size:1, capacity:1 {[0]}
A.push_back(1); // A: size:2, capacity:2 {[0,1]}
A.push_back(2); // A: size:3, capacity:4 {[0,1,2],x}
A.push_back(3); // A: size:4, capacity:4 {[0,1,2,3]}
A.push_back(4); // A: size:5, capacity:8 {[0,1,2,3,4],x,x,x}
Reallocations of memory would occur at lines 4, 5, and 7.
The implementation of std::vector
changed slightly with C++0x and later with the introduction of move semantics (see What are move semantics? for an introduction).
When adding an element to a std::vector
which is already full then the vector
is resized which involves a procedure of allocating a new, larger memory area, moving the existing data to the new vector
, deleting the old vector
space, and then adding the new element.
std::vector
is a collection class in the Standard Template Library. Putting objects into a vector
, taking them out, or the vector
performing a resize when an item is added to a full vector
all require that the class of the object support an assignment operator, a copy constructor, and move semantics. (See type requirements for std::vector as well as std::vector works with classes that are not default constructible? for details.)
One way to think of std::vector
is as a C style array
of contiguous elements of the type specified when the vector
is defined that has some additional functionality to integrate it into the Standard Template Library offerings. What separates a vector
from a standard array
is that a vector
will dynamically grow as items are added. (See std::vector and c-style arrays as well as When would you use an array rather than a vector/string? for some discussion about differences.)
Using std::vector
allows the use of other Standard Template Library components such as algorithms so using std::vector
comes with quite a few advantages over a C style array
as you get to use functionality that already exists.
You can specify an initial size if the maximum is known ahead of time. (See Set both elements and initial capacity of std::vector as well as Choice between vector::resize() and vector::reserve() )
The basics of std::vector
physical representation is of a set of pointers using memory allocated from the heap. These pointers allow for the actual operations for accessing the elements stored in the vector
, deleting elements from the vector
, iterating over the vector
, determining the number of elements, determining its size, etc.
Since the physical representation is contiguous memory, deleting items may result in moving of remaining items to close any holes created by the delete operation.
With modern C++ move semantics, the overhead of std::vector
has been reduced such that it is typically the default container that would be used for most applications as recommended by Bjarne Stroustrup in his book The C++ Programming Language 4th Edition which discusses C++11.
The vector usually has three pointers. If the vector has never been used they are all 0, or NULL.
When an element is inserted, the vector allocates some storage and sets its pointers. It might allocate 1 element, or it might allocate 4 elements. Or 50.
Then it inserts the element and increments the last element pointer.
When you insert more elements than are allocated the vector has to get more memory. It goes out and gets some. If the memory location changes then it has to copy all the elements into the new space and free the old space.
A common choice for resizing is to double the allocation every time it needs more memory.
I think the basic idea of the std::vector
can be understood with an example:
template<typename T>
class vector {
T *storage;
unsigned int length, cap;
void resizeStorage() {
int *copy = new T[cap];
for (unsigned int i = 0 ; i < length, ++i) {
copy[i] = storage[i];
}
delete [] storage;
storage = copy
}
public:
vector(unsigned int cap = 1): length(0), cap(cap), storage(new T[cap]) {
if (!cap)
cap = 1;
}
unsigned int size() {
return this.length;
}
unsigned int capacity() {
return this.cap;
}
T& operator[](int index) {
return storage[index];
}
const T& operator[](int index) const {
return storage[index];
}
void push_back(T element) {
reserve(++length);
storage[length] = element;
}
void reserve(int capacity) {
if(cap >= capacity) {
return;
}
while(cap < capacity) {
cap *= 2;
}
resizeStorage();
}
virtual ~vector() {
delete[] storage;
}
}
We need to reserve enough capacity for every push_back
if the storage size is too small. We can reserve manually too, and clean up memory when we are done. I've used a factor of 2 here, as this is generally how arrays are resized in memory (you could use 3 if you wanted to).
Note that the virtual destructor is generally considered best practice, and not strictly necessary here without inheritance involved. Making the destructor non-virtual would potentially result in faster static binding. However, given that vectors can contain unknown objects it's highly likely vector
has a virtual destructor to deallocate elements for a derived class.
The capacity parameter in the constructor also allows for more fine-grained control than versions prior to C++ 11, and in fact did not exist prior to C++ 11. However, I have included it as vector
included it with Allocator
s in C++ 14. I won't dive into the details (Allocator
simply is another template class to allocate the individual vector elements). The stdlib provides these high-level abstractions to standardise common operations with performance.
There are numerous helper functions for vector
such as swap
, begin
and end
in the stdlib as well, however these just operate on the array store in a safe manner. The actual core vector
implementation is based on the above (similar to C++ standards prior to 11), and additional logic is inferable.
The array copying logic could be done using array std::copy
, and also we could have used smart pointers (specifically shared_ptr
) to clean-up implicitly. But instead, I chose to use low-level APIs though to demonstrate the logical steps.
I wrote a vector in C++ a year or so ago. It is an array with a set size (ex. 16 chars) which is expanded by that amount when needed. That is to say, if the default size is 16 chars and you need to store Hi my name is Bobby
, then it will double the size of the array to 32 chars then store the char array there.