I\'m a beginner C++ programmer and so I\'ve learnt using arrays rather than vectors (this seems to be the general way to do things, then move on to vectors later on).
<I work on a shared library that needs access to structured data. This data is known at compile time, so it uses file-scoped constant arrays of POD (plain old data) structures to hold the data.
This causes the compiler and linker to put most of the data in a read-only section, with two benefits:
The only exception is that the compiler still generates initialization code to load floating point constants, so any structure containing floating point numbers ends up in a writable section. I suspect that this has something do with floating exceptions or floating point rounding modes, but I'm not sure how to verify either hypothesis.
If I used vector and string objects for this, then the compiler would generate a lot more initialization code that would execute whenever my shared library is loaded. The constant data would be allocated on the heap, so it would not be shareable between processes.
If I read the data in from a file on disk, I would have to deal with checking the format of the data, instead of having the C++ compiler do it for me. I also would have to manage the lifetime of the data in a codebase that has had this global data "baked into" it since the beginning.
I suggest to use arrays whenever you know the size at the compile time itself. Although vector can be used, we have to remember that vector comes with its overhead related to the memory allocation done on heap. If the size is not known, then ofcourse use vectors.
The only reason I could think of would be speed. You can do better optimizations on array/pointer types than on the according objects. But I would even use the STL if I absoluteltely knew the amount of data my data structures need to hold. Changing from STL to primitive types in an optimization step is better than starting the project with harder-to-read code.
Never.
If a raw array seems a better solution than a vector (for reasons other said here) then I use std::tr1::array or std::array in C++11 compilers (or boost::array). It simply does the checks I would do anyway to be sure and the size value usage makes DRY automatically implemented (for example I use the size in loops so that future changes of the array declaration will automatically work).
It's the array implementation "is" a raw array with checks and provided size constant anyway, so it's easy to get the array code in embedded code too because the code is not really "too smart" for any compiler. As far as the compiler support templates, I would copy the boost headers in my code to allow me to use this one instead of raw arrays. Because it's clearly too easy to make mistakes with raw arrays. Raw arrays are evil. They are error prone.
And it work very well with STL algorithms (if available).
Now, there is two cases where you need to use raw arrays (obligation) : when you're in C-only code (not communicating with C code, but writing in C-only part of code, like a C library). But then it's another language.
The other reason is when the compiler don't support templates at all...
This question could actually be separated into two parts:
I personally prefer to use std::vector for managing memory except in cases where I need to maintain compatibility with code that doesn't use STL (i.e. when interfacing with straight C code). It's much harder to make exception-safe code with raw arrays allocated via new or malloc (in part because it's really easy to forget that you need to worry about it). See any article on RAII for the reasons.
In practice, std::vector is implemented as a flat array. As such, it's always possible to pull out the raw array and use C-style access patterns. I typically start with the vector subscript operator syntax. For some compilers, when producing a debug version, vectors provide automatic boundary checking. This is slow (often a 10x slowdown for tight loops), but helpful in finding certain types of bugs.
If profiling on a particular platform indicates that the operator[] is a bottleneck, then I switch to directly accessing the raw array. Interestingly, depending on the compiler and OS, it can sometimes be faster to use an STL vector than a raw array.
Here are some results from a simple test application. It was compiled with Visual Studio 2008 in 32-bit Release mode using /O2 optimizations and run on Vista x64. Similar results are achieved with a 64-bit test application.
Binary search...
fill vector (for reference) : 0.27 s
array with ptr math : 0.38 s <-- C-style pointers lose
array with int index : 0.23 s <-- [] on raw array wins
array with ptrdiff_t index : 0.24 s
vector with int index : 0.30 s <-- small penalty for vector abstraction
vector with ptrdiff_t index : 0.30 s
Counting memory (de)allocation...
memset (for reference) : 2.85 s
fill malloc-ed raw array with [] : 2.66 s
fill malloc-ed raw array with ptr : 2.81 s
fill new-ed raw array with [] : 2.64 s
fill new-ed raw array with ptr : 2.65 s
fill vector as array : 3.06 s \ something's slower
fill vector : 3.05 s / with vector!
NOT counting memory (de)allocation...
memset (for reference) : 2.57 s
fill malloc-ed raw array with [] : 2.86 s
fill malloc-ed raw array with ptr : 2.60 s
fill new-ed raw array with [] : 2.63 s
fill new-ed raw array with ptr : 2.78 s
fill vector as array : 2.49 s \ after discounting the
fill vector : 2.54 s / (de)allocation vector is faster!
Code:
#define WINDOWS_LEAN_AND_MEAN
#include <windows.h>
#include <string>
#include <vector>
#include <stdio.h>
using namespace std;
__int64 freq; // initialized in main
int const N = 1024*1024*1024/sizeof(int)/2; // 1/2 GB of data
int const nIter = 10;
class Timer {
public:
Timer(char *name) : name(name) {
QueryPerformanceCounter((LARGE_INTEGER*)&start);
}
~Timer() {
__int64 stop;
QueryPerformanceCounter((LARGE_INTEGER*)&stop);
printf(" %36s : % 4.2f s\n", name.c_str(), (stop - start)/double(freq));
}
private:
string const name;
__int64 start;
};
template <typename Container, typename Index>
int binarySearch_indexed(Container sortedArray, Index first, Index last, int key) {
while (first <= last) {
Index mid = (first + last) / 2; // NOT safe if (first+last) is too big!
if (key > sortedArray[mid]) first = mid + 1;
else if (key < sortedArray[mid]) last = mid - 1;
else return mid;
}
return 0; // Use "(Index)-1" in real code
}
int Dummy = -1;
int const *binarySearch_ptr(int const *first, int const *last, int key) {
while (first <= last) {
int const *mid = (int const *)(((unsigned __int64)first + (unsigned __int64)last) / 2);
if (key > *mid) first = mid + 1;
else if (key < *mid) last = mid - 1;
else return mid;
}
return &Dummy; // no NULL checks: don't do this for real
}
void timeFillWithAlloc() {
printf("Counting memory (de)allocation...\n");
{
Timer tt("memset (for reference)");
int *data = (int*)malloc(N*sizeof(int));
for (int it=0; it<nIter; it++) memset(data, 0, N*sizeof(int));
free(data);
}
{
Timer tt("fill malloc-ed raw array with []");
int *data = (int*)malloc(N*sizeof(int));
for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
free(data);
}
{
Timer tt("fill malloc-ed raw array with ptr");
int *data = (int*)malloc(N*sizeof(int));
for (int it=0; it<nIter; it++) {
int *d = data;
for (size_t i=0; i<N; i++) *d++ = (int)i;
}
free(data);
}
{
Timer tt("fill new-ed raw array with []");
int *data = new int[N];
for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
delete [] data;
}
{
Timer tt("fill new-ed raw array with ptr");
int *data = new int[N];
for (int it=0; it<nIter; it++) {
int *d = data;
for (size_t i=0; i<N; i++) *d++ = (int)i;
}
delete [] data;
}
{
Timer tt("fill vector as array");
vector<int> data(N);
for (int it=0; it<nIter; it++) {
int *d = &data[0];
for (size_t i=0; i<N; i++) *d++ = (int)i;
}
}
{
Timer tt("fill vector");
vector<int> data(N);
for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
}
printf("\n");
}
void timeFillNoAlloc() {
printf("NOT counting memory (de)allocation...\n");
{
int *data = (int*)malloc(N*sizeof(int));
{
Timer tt("memset (for reference)");
for (int it=0; it<nIter; it++) memset(data, 0, N*sizeof(int));
}
free(data);
}
{
int *data = (int*)malloc(N*sizeof(int));
{
Timer tt("fill malloc-ed raw array with []");
for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
}
free(data);
}
{
int *data = (int*)malloc(N*sizeof(int));
{
Timer tt("fill malloc-ed raw array with ptr");
for (int it=0; it<nIter; it++) {
int *d = data;
for (size_t i=0; i<N; i++) *d++ = (int)i;
}
}
free(data);
}
{
int *data = new int[N];
{
Timer tt("fill new-ed raw array with []");
for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
}
delete [] data;
}
{
int *data = new int[N];
{
Timer tt("fill new-ed raw array with ptr");
for (int it=0; it<nIter; it++) {
int *d = data;
for (size_t i=0; i<N; i++) *d++ = (int)i;
}
}
delete [] data;
}
{
vector<int> data(N);
{
Timer tt("fill vector as array");
for (int it=0; it<nIter; it++) {
int *d = &data[0];
for (size_t i=0; i<N; i++) *d++ = (int)i;
}
}
}
{
vector<int> data(N);
{
Timer tt("fill vector");
for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
}
}
printf("\n");
}
void timeBinarySearch() {
printf("Binary search...\n");
vector<int> data(N);
{
Timer tt("fill vector (for reference)");
for (size_t i=0; i<N; i++) data[i] = (int)i;
}
{
Timer tt("array with ptr math");
int sum = 0;
for (int i=-1000000; i<1000000; i++) {
sum += *binarySearch_ptr(&data[0], &data[0]+data.size(), i);
}
}
{
Timer tt("array with int index");
int sum = 0;
for (int i=-1000000; i<1000000; i++) {
sum += data[binarySearch_indexed<int const *, int>(
&data[0], 0, (int)data.size(), -1)];
}
}
{
Timer tt("array with ptrdiff_t index");
int sum = 0;
for (int i=-1000000; i<1000000; i++) {
sum += data[binarySearch_indexed<int const *, ptrdiff_t>(
&data[0], 0, (ptrdiff_t)data.size(), -1)];
}
}
{
Timer tt("vector with int index");
int sum = 0;
for (int i=-1000000; i<1000000; i++) {
sum += data[binarySearch_indexed<vector<int> const &, int>(
data, 0, (int)data.size(), -1)];
}
}
{
Timer tt("vector with ptrdiff_t index");
int sum = 0;
for (int i=-1000000; i<1000000; i++) {
sum += data[binarySearch_indexed<vector<int> const &, ptrdiff_t>(
data, 0, (ptrdiff_t)data.size(), -1)];
}
}
printf("\n");
}
int main(int argc, char **argv)
{
QueryPerformanceFrequency((LARGE_INTEGER*)&freq);
timeBinarySearch();
timeFillWithAlloc();
timeFillNoAlloc();
return 0;
}
Array/char* is useful when compatibility or performance is of very high priority. Vectors and Strings are higher-level objects that are better when code maintainability, readability and overal easiness counts. Almost always, that is.