Efficient linked list in C++?

后端 未结 11 643
孤街浪徒
孤街浪徒 2021-02-02 06:36

This document says std::list is inefficient:

std::list is an extremely inefficient class that is rarely useful. It performs a heap allocation

相关标签:
11条回答
  • 2021-02-02 07:00

    The requirement of not invalidating iterators except the one on a node being deleted is forbidding to every container that doesn't allocate individual nodes and is much different from e.g. list or map.
    However, I've found that in almost every case when I thought that this was necessary, it turned out with a little discipline I could just as well do without. You might want to verify if you can, you would benefit greatly.

    While std::list is indeed the "correct" thing if you need something like a list (for CS class, mostly), the statement that it is almost always the wrong choice is, unluckily, exactly right. While the O(1) claim is entirely true, it's nevertheless abysmal in relation to how actual computer hardware works, which gives it a huge constant factor. Note that not only are the objects that you iterate randomly placed, but the nodes that you maintain are, too (yes, you can somehow work around that with an allocator, but that is not the point). On the average, you have two one guaranteed cache misses for anything you do, plus up to two one dynamic allocations for mutating operations (one for the object, and another one for the node).

    Edit: As pointed out by @ratchetfreak below, implementations of std::list commonly collapse the object and node allocation into one memory block as an optimization (akin to what e.g. make_shared does), which makes the average case somewhat less catastrophic (one allocation per mutation and one guaranteed cache miss instead of two).
    A new, different consideration in this case might be that doing so may not be entirely trouble-free either. Postfixing the object with two pointers means reversing the direction while dereference which may interfere with auto prefetch.
    Prefixing the object with the pointers, on the other hand, means you push the object back by two pointers' size, which will mean as much as 16 bytes on a 64-bit system (that might split a mid-sized object over cache line boundaries every time). Also, there's to consider that std::list cannot afford to break e.g. SSE code solely because it adds a clandestine offset as special surprise (so for example the xor-trick would likely not be applicable for reducing the two-pointer footprint). There would likely have to be some amount of "safe" padding to make sure objects added to a list still work the way they should.
    I am unable to tell whether these are actual performance problems or merely distrust and fear from my side, but I believe it's fair to say that there may be more snakes hiding in the grass than one expects.

    It's not for no reason that high-profile C++ experts (Stroustrup, notably) recommend using std::vector unless you have a really good reason not to.

    Like many people before, I've tried to be smart about using (or inventing) something better than std::vector for one or the other particular, specialized problem where it seems you can do better, but it turns out that simply using std::vector is still almost always the best, or second best option (if std::vector happens to be not-the-best, std::deque is usually what you need instead).
    You have way fewer allocations than with any other approach, way less memory fragmentation, way fewer indirections, and a much more favorable memory access pattern. And guess what, it's readily available and just works.
    The fact that every now and then inserts require a copy of all elements is (usually) a total non-issue. You think it is, but it's not. It happens rarely and it is a copy of a linear block of memory, which is exactly what processors are good at (as opposed to many double-indirections and random jumps over memory).

    If the requirement not to invalidate iterators is really an absolute must, you could for example pair a std::vector of objects with a dynamic bitset or, for lack of something better, a std::vector<bool>. Then use reserve() appropriately so reallocations do not happen. When deleting an element, do not remove it but only mark it as deleted in the bitmap (call the destructor by hand). At appropriate times, when you know that it's OK to invalidate iterators, call a "vacuum cleaner" function that compacts both the bit vector and the object vector. There, all unforeseeable iterator invalidations gone.

    Yes, that requires maintaining one extra "element was deleted" bit, which is annoying. But a std::list must maintain two pointers as well, in additon to the actual object, and it must do allocations. With the vector (or two vectors), access is still very efficient, as it happens in a cache-friendly way. Iterating, even when checking for deleted nodes, still means you move linearly or almost-linearly over memory.

    0 讨论(0)
  • 2021-02-02 07:00

    I just wanted to make a small comment about your choice. I'm a huge fan of vector because of it's read speeds, and you can direct access any element, and do sorting if need be. (vector of class/struct for example).

    But anyways I digress, there's two nifty tips I wanted to disclose. With vector inserts can be expensive, so a neat trick, don't insert if you can get away with not doing it. do a normal push_back (put at the end) then swap the element with one you want.

    Same with deletes. They are expensive. So swap it with the last element, delete it.

    0 讨论(0)
  • 2021-02-02 07:02

    Thanks for all the answers. This is a simple - though not rigorous - benchmark.

    // list.cc
    #include <list>
    using namespace std;
    
    int main() {
        for (size_t k = 0; k < 1e5; k++) {
            list<size_t> ln;
            for (size_t i = 0; i < 200; i++) {
                ln.insert(ln.begin(), i);
                if (i != 0 && i % 20 == 0) {
                    ln.erase(++++++++++ln.begin());
                }
            }
        }
    }
    

    and

    // vector.cc
    #include <vector>
    using namespace std;
    
    int main() {
        for (size_t k = 0; k < 1e5; k++) {
            vector<size_t> vn;
            for (size_t i = 0; i < 200; i++) {
                vn.insert(vn.begin(), i);
                if (i != 0 && i % 20 == 0) {
                    vn.erase(++++++++++vn.begin());
                }
            }
        }
    }
    

    This test aims to test what std::list claims to excel at - O(1) inserting and erasing. And, because of the positions I ask to insert/delete, this race is heavily skewed against std::vector, because it has to shift all the following elements (hence O(n)), while std::list doesn't need to do that.

    Now I compile them.

    clang++ list.cc -o list
    clang++ vector.cc -o vector
    

    And test the runtime. The result is:

      time ./list
      ./list  4.01s user 0.05s system 91% cpu 4.455 total
      time ./vector
      ./vector  1.93s user 0.04s system 78% cpu 2.506 total
    

    std::vector has won.

    Compiled with optimization O3, std::vector still wins.

      time ./list
      ./list  2.36s user 0.01s system 91% cpu 2.598 total
      time ./vector
      ./vector  0.58s user 0.00s system 50% cpu 1.168 total
    

    std::list has to call heap allocation for each element, while std::vector can allocate heap memory in batch (though it might be implementation-dependent), hence std::list's insert/delete has a higher constant factor, though it is O(1).

    No wonder this document says

    std::vector is well loved and respected.

    EDIT: std::deque does even better in some cases, at least for this task.

    // deque.cc
    #include <deque>
    using namespace std;
    
    int main() {
        for (size_t k = 0; k < 1e5; k++) {
            deque<size_t> dn;
            for (size_t i = 0; i < 200; i++) {
                dn.insert(dn.begin(), i);
                if (i != 0 && i % 20 == 0) {
                    dn.erase(++++++++++dn.begin());
                }
            }
        }
    }
    

    Without optimization:

    ./deque  2.13s user 0.01s system 86% cpu 2.470 total
    

    Optimized with O3:

    ./deque  0.27s user 0.00s system 50% cpu 0.551 total
    
    0 讨论(0)
  • 2021-02-02 07:04

    Use two std::lists: One "free-list" that's preallocated with a large stash of nodes at startup, and the other "active" list into which you splice nodes from the free-list. This is constant time and doesn't require allocating a node.

    0 讨论(0)
  • 2021-02-02 07:05

    The simplest way I see to fulfill all your requirements:

    1. Constant-time insertion/removal (hope amortized constant-time is okay for insertion).
    2. No heap allocation/deallocation per element.
    3. No iterator invalidation on removal.

    ... would be something like this, just making use of std::vector:

    template <class T>
    struct Node
    {
        // Stores the memory for an instance of 'T'.
        // Use placement new to construct the object and
        // manually invoke its dtor as necessary.
        typename std::aligned_storage<sizeof(T), alignof(T)>::type element;
    
        // Points to the next element or the next free
        // element if this node has been removed.
        int next;
    
        // Points to the previous element.
        int prev;
    };
    
    template <class T>
    class NodeIterator
    {
    public:
        ...
    private:
        std::vector<Node<T>>* nodes;
        int index;
    };
    
    template <class T>
    class Nodes
    {
    public:
        ...
    private:
        // Stores all the nodes.
        std::vector<Node> nodes;
    
        // Points to the first free node or -1 if the free list
        // is empty. Initially this starts out as -1.
        int free_head;
    };
    

    ... and hopefully with a better name than Nodes (I'm slightly tipsy and not so good at coming up with names at the moment). I'll leave the implementation up to you but that's the general idea. When you remove an element, just do a doubly-linked list removal using the indices and push it to the free head. The iterator doesn't invalidate since it stores an index to a vector. When you insert, check if the free head is -1. If not, overwrite the node at that position and pop. Otherwise push_back to the vector.

    Illustration

    Diagram (nodes are stored contiguously inside std::vector, we simply use index links to allow skipping over elements in a branchless way along with constant-time removals and insertions anywhere):

    Let's say we want to remove a node. This is your standard doubly-linked list removal, except we use indices instead of pointers and you also push the node to the free list (which just involves manipulating integers):

    Removal adjustment of links:

    Pushing removed node to free list:

    Now let's say you insert to this list. In that case, you pop off the free head and overwrite the node at that position.

    After insertion:

    Insertion to the middle in constant-time should likewise be easy to figure out. Basically you just insert to the free head or push_back to the vector if the free stack is empty. Then you do your standard double-linked list insertion. Logic for the free list (though I made this diagram for someone else and it involves an SLL, but you should get the idea):

    Make sure you properly construct and destroy the elements using placement new and manual calls to the dtor on insertion/removal. If you really want to generalize it, you'll also need to think about exception-safety and we also need a read-only const iterator.

    Pros and Cons

    The benefit of such a structure is that it does allow very rapid insertions/removals from anywhere in the list (even for a gigantic list), insertion order is preserved for traversal, and it never invalidates the iterators to element which aren't directly removed (though it will invalidate pointers to them; use deque if you don't want pointers to be invalidated). Personally I'd find more use for it than std::list (which I practically never use).

    For large enough lists (say, larger than your entire L3 cache as a case where you should definitely expect a huge edge), this should vastly outperform std::vector for removals and insertions to/from the middle and front. Removing elements from vector can be quite fast for small ones, but try removing a million elements from a vector starting from the front and working towards the back. There things will start to crawl while this one will finish in the blink of an eye. std::vector is ever-so-slightly overhyped IMO when people start using its erase method to remove elements from the middle of a vector spanning 10k elements or more, though I suppose this is still preferable over people naively using linked lists everywhere in a way where each node is individually allocated against a general-purpose allocator while causing cache misses galore.

    The downside is that it only supports sequential access, requires the overhead of two integers per element, and as you can see in the above diagram, its spatial locality degrades if you constantly remove things sporadically.

    Spatial Locality Degradation

    The loss of spatial locality as you start removing and inserting a lot from/to the middle will lead to zig-zagging memory access patterns, potentially evicting data from a cache line only to go back and reload it during a single sequential loop. This is generally inevitable with any data structure that allows removals from the middle in constant-time while likewise allowing that space to be reclaimed while preserving the order of insertion. However, you can restore spatial locality by offering some method or you can copy/swap the list. The copy constructor can copy the list in a way that iterates through the source list and inserts all the elements which gives you back a perfectly contiguous, cache-friendly vector with no holes (though doing this will invalidate iterators).

    Alternative: Free List Allocator

    An alternative that meets your requirements is implement a free list conforming to std::allocator and use it with std::list. I never liked reaching around data structures and messing around with custom allocators though and that one would double the memory use of the links on 64-bit by using pointers instead of 32-bit indices, so I'd prefer the above solution personally using std::vector as basically your analogical memory allocator and indices instead of pointers (which both reduce size and become a requirement if we use std::vector since pointers would be invalidated when vector reserves a new capacity).

    Indexed Linked Lists

    I call this kind of thing an "indexed linked list" as the linked list isn't really a container so much as a way of linking together things already stored in an array. And I find these indexed linked lists exponentially more useful since you don't have to get knee-deep in memory pools to avoid heap allocations/deallocations per node and can still maintain reasonable locality of reference (great LOR if you can afford to post-process things here and there to restore spatial locality).

    You can also make this singly-linked if you add one more integer to the node iterator to store the previous node index (comes free of memory charge on 64-bit assuming 32-bit alignment requirements for int and 64-bit for pointers). However, you then lose the ability to add a reverse iterator and make all iterators bidirectional.

    Benchmark

    I whipped up a quick version of the above since you seem interested in 'em: release build, MSVC 2012, no checked iterators or anything like that:

    --------------------------------------------
    - test_vector_linked
    --------------------------------------------
    Inserting 200000 elements...
    time passed for 'inserting': {0.000015 secs}
    
    Erasing half the list...
    time passed for 'erasing': {0.000021 secs}
    time passed for 'iterating': {0.000002 secs}
    time passed for 'copying': {0.000003 secs}
    
    Results (up to 10 elements displayed):
    [ 11 13 15 17 19 21 23 25 27 29 ]
    
    finished test_vector_linked: {0.062000 secs}
    --------------------------------------------
    - test_vector
    --------------------------------------------
    Inserting 200000 elements...
    time passed for 'inserting': {0.000012 secs}
    
    Erasing half the vector...
    time passed for 'erasing': {5.320000 secs}
    time passed for 'iterating': {0.000000 secs}   
    time passed for 'copying': {0.000000 secs}
    
    Results (up to 10 elements displayed):
    [ 11 13 15 17 19 21 23 25 27 29 ]
    
    finished test_vector: {5.320000 secs}
    

    Was too lazy to use a high-precision timer but hopefully that gives an idea of why one shouldn't use vector's linear-time erase method in critical paths for non-trivial input sizes with vector above there taking ~86 times longer (and exponentially worse the larger the input size -- I tried with 2 million elements originally but gave up after waiting almost 10 minutes) and why I think vector is ever-so-slightly-overhyped for this kind of use. That said, we can turn removal from the middle into a very fast constant-time operation without shuffling the order of the elements, without invalidating indices and iterators storing them, and while still using vector... All we have to do is simply make it store a linked node with prev/next indices to allow skipping over removed elements.

    For removal I used a randomly shuffled source vector of even-numbered indices to determine what elements to remove and in what order. That somewhat mimics a real world use case where you're removing from the middle of these containers through indices/iterators you formerly obtained, like removing the elements the user formerly selected with a marquee tool after he his the delete button (and again, you really shouldn't use scalar vector::erase for this with non-trivial sizes; it'd even be better to build a set of indices to remove and use remove_if -- still better than vector::erase called for one iterator at a time).

    Note that iteration does become slightly slower with the linked nodes, and that doesn't have to do with iteration logic so much as the fact that each entry in the vector is larger with the links added (more memory to sequentially process equates to more cache misses and page faults). Nevertheless, if you're doing things like removing elements from very large inputs, the performance skew is so epic for large containers between linear-time and constant-time removal that this tends to be a worthwhile exchange.

    0 讨论(0)
  • 2021-02-02 07:07

    The new slot_map proposal claim O(1) for insert and delete.

    There is also a link to a video with a proposed implementation and some previous work.

    If we knew more about the actual structure of the elements there might be some specialized associative containers that are much better.

    0 讨论(0)
提交回复
热议问题