Most efficient way of erasing/deleting multiple std::vector elements while retaining original order?

前端 未结 7 1405
隐瞒了意图╮
隐瞒了意图╮ 2020-12-23 17:37


i have a std::vector and a second container holding iterators or indexes (no keys, i want constant access to the element) to this vector for del

相关标签:
7条回答
  • 2020-12-23 18:21

    How about looping through the vector, and for each element that needs to be removed, copy the next element that doesn't need to be removed in to that position. Then when you get to the end, truncate it.

    int last = 0;
    for(int i=0; i<vec.size(); ++i, ++last)
    {
       while(needs_to_be_removed(i))
          ++i;
       if(i >= vec.size()) break;
    
       vec[last] = vec[i];   
    }
    
    vec.resize(last);
    
    0 讨论(0)
  • 2020-12-23 18:26

    You can copy all elements of the vector to a list unless the index in your second container, and then back to a vector. Even with your algorithm of going from the end of the vector to the front, there's a lot of work going on behind the scenes in your vector.

    Make your second container a map so it keeps the indeces sorted for you automatically.

    edit:

    To respond to the comment

    The cost of maintaining a map is worst case the same as maintaining another structure (list or vector) and then sorting it. If you're already doing that, you might as well keep it as a map. It doesn't make sense to complain about the overhead of a map vs. the overhead of sorting a list.

    As for the performance of my suggested algorithm, if m is the number of elements to be deleted, and n is the total number of elements then it results in O(n - m).

    Of course, this is mostly just humoring your attempt to optimize with a vector.

    1 - You shouldn't be using a vector if you want to do random access deletes. That's not what they're good at, use a list if at all possible. And since you seem to be much more interested in relative order rather than absolute index, I am wondering why a vector is needed at all. If you gave the entire problem, there's probably a common solution to let you use the most efficient data structure to solve it.

    2 - Instead of maintaining a second data structure, mark elements that need to be deleted directly in their container. A trivial way is instead using a container< T > use a container< std::pair< T, char > > and use the char to keep track of the element status.

    If you do 1 and 2, you remove all copying completely and get a much more efficient implementation.

    0 讨论(0)
  • 2020-12-23 18:27

    In <algorithm> there is a remove_if function which squeezes all values not removed to the front maintaining the order. This works if those 200 elements can be purely determined by the values, not index.

    This is essentially the Erase-remove idiom you have linked to. remove_if is guaranteed to perform O(N) comparisons (and at most O(N) copyings), which would be more efficient than sorting (O(N log N)), although your last option doesn't actually require sorting if the indices are determined from values (just scan in the reversed direction while copying).

    Nevertheless, using remove_if (if you can) is better than the other 2 options because the implementation has already been written for you, so there's less chance of logical error and conveys better what (not how) to do.

    0 讨论(0)
  • 2020-12-23 18:30

    First thing is, don't call erase more times than you have to, because for a vector it shuffles all the later elements down, giving the whole operation an Ω(n*m) worst case run time (n the size of the vector, m the size of the list of indexes to remove).

    I think the first thing I'd try would be similar to your current code:

    • sort the indexes
    • create a new vector of size n - m
    • iterate over the original vector, copying indexes[0] elements, skipping an element, then copying indexes[1] - indexes[0] - 1 elements, skip an element, and so on.
    • swap the original vector with the new one.

    You might be able to do the third step with remove_copy_if and a predicate which contains state (counting how many items it has copied and how far it is through the sorted list of indexes), but for extremely tedious and obscure reasons this isn't guaranteed to work (algorithm predicates with mutable state are problematic, it seems to be the consensus that the standard doesn't guarantee that the same copy of the predicate is used throughout the algorithm). So I really don't advise trying it, but it might help to bear in mind that what you're writing basically is a modified version of remove_copy_if.

    You could avoid the second step using a back_inserter rather than presizing the vector, although you'd presumably still reserve the space in advance.

    [Edit: come to think of it, why am I copying anything? Rather than implementing a modified remove_copy_if, implement a modified remove_if, and just copy to an earlier point in the vector. Then erase/resize at the end. I wouldn't worry about the O(m log m) sort of the indexes until proven to be a problem, because it's unlikely to be significantly slower than the Ω(m) operation to read all the values to be removed, and store them in some kind of container. Then, using this container in the predicate to remove_if may or may not be O(1). Sorting might turn out faster for plausible values of m.]

    0 讨论(0)
  • 2020-12-23 18:33

    Elements of what? Maybe I'm taking your post to seriously but if you have a vector of 1000 elements why not mark the ones that are not valid anymore and do away with erasing in the first place. Obviously I'm making an assumption here that your elements are not demanding a lot of memory.

    I only bring this up because you seem to be concerned with speed. If the suggestions already given don't do the trick maybe this idea is worth a thought! In essence speed things up by not doing the operation in the first place.

    0 讨论(0)
  • 2020-12-23 18:35

    I've written a function, based on Benjamin Lindley answer https://stackoverflow.com/a/4115582/2835054.

    #include <iostream>
    #include <algorithm>
    #include <vector>
    
    template <typename elementType, typename indexType>
    void remove_multiple_elements_from_vector(std::vector<elementType> &vector,
    std::vector<indexType> &indexes)
    {
        // 1. indexType is any integer.
        // 2. elementType is any type.
        // 3. Indexes should be unique.
        // 4. The largest index inside indexes shouldn't be larger than
        //    the largetst index in the vector.
        // 5. Indexes should be sorted in ascending order
        //    (it is done inside function).
        std::sort(indexes.begin(), indexes.end());
        indexType currentIndexInIndexesVector = 0;
        indexType last = 0;
        for(indexType i=0; i<vector.size(); ++i, ++last)
        {
           while(indexes[currentIndexInIndexesVector] == i)
           {
              ++i;
              ++currentIndexInIndexesVector;
           }
           if(i >= vector.size()) break;
    
           vector[last] = vector[i];   
        }
    
        vector.resize(last);
    }
    
    
    int main()
    {
        std::vector<int> vector = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
        std::vector<int> indexes = {0, 10, 5};
    
        for (auto &vectorElement : vector)
        {
            std::cout << vectorElement << " ";
        }    
        std::cout << "\n";
    
        remove_multiple_elements_from_vector<int, int>(vector, indexes);
    
        for (auto &vectorElement : vector)
        {
            std::cout << vectorElement << " ";
        }
    }
    
    0 讨论(0)
提交回复
热议问题