There are some questions quite similar around here, but they couldn't help me get my mind around it. Also, I'm giving a full example code, so it might be easier for others to understand.
I have made a vector container (couldn't use stl for memory reasons) that used to use only operator= for push_back*, and once I came accross placement new, I decided to introduce an additional "emplace_back" to it**.
*(T::operator= is expected to deal with memory management)
**(the name is taken from a similar function in std::vector that I've encountered later, the original name I gave it was a mess).
I read some stuff about the danger of using placement new over operator new[] but couldn't figure out if the following is ok or not, and if not, what's wrong with it, and what should I replace it with, so I'd appreciate your help.
This is of couse a simplified code, with no iterators, and no extended functionality, but it makes the point :
template <class T>
class myVector {
public :
myVector(int capacity_) {
_capacity = capacity_;
_data = new T[_capacity];
_size = 0;
}
~myVector() {
delete[] _data;
}
bool push_back(T const & t) {
if (_size >= _capacity) { return false; }
_data[_size++] = t;
return true;
}
template <class... Args>
bool emplace_back(Args const & ... args) {
if (_size >= _capacity) { return false; }
_data[_size].~T();
new (&_data[_size++]) T(args...);
return true;
}
T * erase (T * p) {
//assert(/*p is not aligned*/);
if (p < begin() || p >= end()) { return end(); }
if (p == &back()) { --_size; return end(); }
*p = back();
--_size;
return p;
}
// The usual stuff (and more)
int capacity() { return _capacity; }
int size() { return _size; }
T * begin() { return _data; }
T * end() { return _data + _size; }
T const * begin() const { return _data; }
T const * end() const { return _data + _size; }
T & front() { return *begin(); }
T & back() { return *(end() - 1); }
T const & front() const { return *begin(); }
T const & back() const { return *(end() - 1); }
T & operator[] (int i) { return _data[i]; }
T const & operator[] (int i) const { return _data[i]; }
private:
T * _data;
int _capacity;
int _size;
};
Thanks
I read some stuff about the danger of using placement new over operator new[] but couldn't figure out if the following is ok or not, and if not, what's wrong with it [...]
For operator new[]
vs. placement new, it's only really bad (as in typically-crashy type of undefined behavior) if you mix the two strategies together.
The main choice you typically have to make is to use one or the other. If you use operator new[]
, then you construct all the elements for the entire capacity of the container in advance and overwrite them in methods like push_back
. You don't destroy them on removal in methods like erase
, just kind of keep them there and adjust the size, overwrite elements, and so forth. You both construct and allocate a multiple elements all in one go with operator new[]
, and destroy and deallocate them all in one go using operator delete[]
.
Why Placement New is Used For Standard Containers
First thing to understand if you want to start rolling your own vectors or other standard-compliant sequences (that aren't simply linked structures with one element per node) in a way that actually destroys elements when they are removed, constructs elements (not merely overwrite them) when added, is to separate the idea of allocating the memory for the container and constructing the elements for it in place. So quite to the contrary, in this case, placement new isn't bad. It's a fundamental necessity to achieve the general qualities of the standard containers. But we can't mix it with operator new[]
and operator delete[]
in this context.
For example, you might allocate the memory to hold 100 instances of T in reserve
, but you don't want to default construct them as well. You want to construct them in methods like push_back
, insert
, resize
, the fill ctor
, range ctor
, copy ctor
, etc. -- methods that actually add elements and not merely the capacity to hold them. That's why we need placement new.
Otherwise we lose the generality of std::vector
which avoids constructing elements that aren't there, can copy construct in push_backs
rather than simply overwriting existing ones with operator=
, etc.
So let's start with the constructor:
_data = new T[_capacity];
... this will invoke the default constructors for all the elements. We don't want that (neither the default ctor requirement nor this expense), as the whole point of using placement new
is to construct elements in-place of allocated memory, and this would have already constructed all elements. Otherwise any use of placement new anywhere will try to construct an already-constructed element a second time, and will be UB.
Instead you want something like this:
_data = static_cast<T*>(malloc(_capacity * sizeof(T)));
This just gives us a raw chunk of bytes.
Second, for push_back
, you're doing:
_data[_size++] = t;
That's trying to use the assignment operator, and, after our previous modification, on an uninitialized/invalid element which hasn't been constructed yet. So we want:
new(_data + _size) T(t);
++size;
... that makes it use the copy constructor. It makes it match up with what push_back
is actually supposed to do: creating new elements in the sequence instead of simply overwriting existing ones.
Your erase method needs some work even at the basic logic level if you want to handle removals from the middle of the container. But just from the resource management standpoint, if you use placement new, you want to manually invoke destructors for removed elements. For example:
if (p == &back()) { --_size; return end(); }
... should be more like:
if (p == &back())
{
--size;
(_data + _size)->~T();
return end();
}
Your emplace_back
manually invokes a destructor but it shouldn't do this. emplace_back
should only add, not remove (and destroy) existing elements. It should be quite similar to push_back
but simply invoking the move ctor.
Your destructor does this:
~myVector() {
delete[] _data;
}
But again, that's UB when we take this approach. We want something more like:
~myVector() {
for (int j=0; j < _size; ++j)
(_data + j)->~T();
free(_data);
}
There's still a whole lot more to cover like exception-safety which is a whole different can of worms.
But this should get you started with respect to proper usage of placement new in a data structure against some memory allocator (malloc/free
in this exemplary case).
Last but not least:
(couldn't use stl for memory reasons)
... this might be an unusual reason. Your implementation doesn't necessarily use any less memory than a vector
with reserve
called in advance to give it the appropriate capacity
. You might shave off a few bytes for on a per-container-level (not on a per-element level) with the choice of 32-bit integrals and no need to store an allocator, but it's going to be a very small memory savings in exchange for a boatload of work.
This kind of thing can be a useful learning exercise though to help you build some data structures outside the standard in a more standard-compliant way (ex: unrolled lists which I find quite useful).
I ended up having to reinvent some vectors
and vector-like containers for ABI reasons (we wanted a container we could pass through our API that was guaranteed to have the same ABI regardless of what compiler was used to build a plugin). Even then, I would have much preferred simply using std::vector
.
Note that if you just want to take control of how vector
allocates memory, you can do that by specifying your own allocator with a compliant interface. This might be useful, for example, if you want a vector
which allocates 128-bit aligned memory for use with aligned move instructions using SIMD.
来源:https://stackoverflow.com/questions/33906008/c-placement-new-in-a-home-made-vector-container