Choice of the most performant container (array)

后端 未结 5 806
野性不改
野性不改 2021-02-08 02:09

This is my little big question about containers, in particular, arrays.

I am writing a physics code that mainly manipulates a big (> 1 000 000) set of \"particles\" (wit

5条回答
  •  梦如初夏
    2021-02-08 02:22

    If you think that this is the best option, would it be possible to wrap this vector in a way that it can be accessed with a index operator defined as other_array[i,j] // same as other_array[6*i+j] without overhead (like function call at each access) ?

    (other_array[i,j] won't work too well, as i,j employs the comma operator to evaluate the value of "i", then discards that and evaluates and returns "j", so it's equivalent to other_array[i]).

    You will need to use one of:

    other_array[i][j]
    other_array(i, j)  // if other_array implements operator()(int, int),
                       // but std::vector<> et al don't.
    other_array[i].identifier // identifier is a member variable
    other_array[i].identifier() // member function getting value
    other_array[i].identifier(double) // member function setting value
    

    You may or may not prefer to put get_ and set_ or similar on the last two functions should you find them useful, but from your question I think you won't: functions are prefered in APIs between parts of large systems involving many developers, or when the data items may vary and you want the algorithms working on the data to be independent thereof.

    So, a good test: if you find yourself writing code like other_array[i][3] where you've decided "3" is the double with the speed in it, and other_array[i][5] because "5" is the the acceleration, then stop doing that and give them proper identifiers so you can say other_array[i].speed and .acceleration. Then other developers can read and understand it, and you're much less likely to make accidental mistakes. On the other hand, if you are iterating over those 6 elements doing exactly the same things to each, then you probably do want Particle to hold a double[6], or to provide an operator[](int). There's no problem doing both:

    struct Particle
    {
        double x[6];
        double& speed() { return x[3]; }
        double speed() const { return x[3]; }
        double& acceleration() { return x[5]; }
        ...
    };
    

    BTW / the reason that vector > may be too costly is that each set of 6 doubles will be allocated on the heap, and for fast allocation and deallocation many heap implementations use fixed-size buckets, so your small request will be rounded up t the next size: that may be a significant overhead. The outside vector will also need to record a extra pointer to that memory. Further, heap allocation and deallocation is relatively slow - in you're case, you'd only be doing it at startup and shutdown, but there's no particular point in making your program slower for no reason. Even more importantly, the areas on the heap may just around in memory, so your operator[] may have cache-faults pulling in more distinct memory pages than necessary, slowing the entire program. Put another way, vectors store elements contiguously, but the pointed-to-vectors may not be contiguous.

提交回复
热议问题