Why is a LinkedList Generally Slower than a List?

前端 未结 6 475
旧时难觅i
旧时难觅i 2021-02-01 13:47

I started using some LinkedList’s instead of Lists in some of my C# algorithms hoping to speed them up. However, I noticed that they just felt slower. Like any good developer, I

相关标签:
6条回答
  • 2021-02-01 14:16

    Since the other answers didn't mention this, I'm adding another.

    Although your print statement says "List Insert" you actually called List<T>.Add, which is the one kind of "insertion" that List is actually good at. Add is a special case of just using the next element is the underlying storage array and nothing has to get moved out of the way. Try really using List<T>.Insert instead to make it the worst case instead of the best case.

    Edit:

    To summarize, for the purposes of insertion, a list is a special-purpose data structure that is only fast at one kind of insertion: append to the end. A linked-list is a general-purpose data structure that is equally fast at inserting anywhere into the list. And there is one more detail: the linked-list has higher memory and CPU overhead so its fixed costs are higher.

    So your benchmark compares general-purpose linked-list insertion against special-purpose list append to the end and so it is not surprising that the finely-tuned optimized data structure that is being used exactly as it was intended is performing well. If you want linked list to compare favorably, you need a benchmark that list will find challenging and that means you will need to insert at the beginning or into the middle of the list.

    0 讨论(0)
  • 2021-02-01 14:19

    Keep in mind that you've got lists of primitives. For List this is very simple because it creates a whole array of int and it's very easy for it to shift these down when it doesn't have to allocate more memory.

    Contrast this to a LinkedList that always must allocate memory to wrap the ints. Thus I think the memory allocation is probably what's contributing the most to your time. If you already had the node allocated, it should be faster overall. I'd try an experiment with the overload of AddFirst that takes a LinkedListNode to verify (that is, create the LinkedListNode outside of the scope of the timer, just time the add of it).

    Iterating is similar, it's much more efficient to go to the next index in an internal array than to follow links.

    0 讨论(0)
  • 2021-02-01 14:22

    Update (in response to your comment): you're right, discussing big-O notation by itself is not exactly useful. I included a link to James's answer in my original response because he already offered a good explanation of the technical reasons why List<T> outperforms LinkedList<T> in general.

    Basically, it's a matter of memory allocation and locality. When all of your collection's elements are stored in an array internally (as is the case with List<T>), it's all in one contiguous block of memory which can be accessed very quickly. This applies both to adding (as this simply writes to a location within the already-allocated array) as well as iterating (as this accesses many memory locations that are very close together rather than having to follow pointers to completely disconnected memory locations).

    A LinkedList<T> is a specialized collection, which only outshines List<T> in the case where you are performing random insertions or removals from the middle of the list—and even then, only maybe.

    As for the question of scaling: you're right, if big-O notation is all about how well an operation scales, then an O(1) operation should eventually beat out an O(>1) operation given a large enough input—which is obviously what you were going for with 20 million iterations.

    This is why I mentioned that List<T>.Add has an amortized complexity of O(1). That means adding to a list is also an operation that scales linearly with the size of the input, the same (effectively) as with a linked list. Forget about the fact that occasionally the list has to resize itself (this is where the "amortized" comes in; I encourage you to visit that Wikipedia article if you haven't already). They scale the same.

    Now, interestingly, and perhaps counter-intuitively, this means that if anything, the performance difference between List<T> and LinkedList<T> (again, when it comes to adding) actually becomes more obvious as the number of elements increases. The reason is that when the list runs out of space in its internal array, it doubles the size of the array; and thus with more and more elements, the frequency of resizing operations decreases—to the point where the array is basically never resizing.

    So let's say a List<T> starts with an internal array large enough to hold 4 elements (I believe that's accurate, though I don't remember for sure). Then as you add up to 20 million elements, it resizes itself a total of ~(log2(20000000) - 1) or 23 times. Compare this to the 20 million times you're performing the considerably less efficient AddLast on a LinkedList<T>, which allocates a new LinkedListNode<T> with every call, and those 23 resizes suddenly seem pretty insignificant.

    I hope this helps! If I haven't been clear on any points, let me know and I will do my best to clarify and/or correct myself.


    James is right on.

    Remember that big-O notation is meant to give you an idea of how the performance of an algorithm scales. It does not mean that something that performs in guaranteed O(1) time will outperform something else that performs in amortized O(1) time (as is the case with List<T>).

    Suppose you have a choice of two jobs, one of which requires a commute 5 miles down a road that occasionally suffers from traffic jams. Ordinarily this drive should take you about 10 minutes, but on a bad day it could be more like 30 minutes. The other job is 60 miles away but the highway is always clear and never has any traffic jams. This drive always takes you an hour.

    That's basically the situation with List<T> and LinkedList<T> for purposes of adding to the end of the list.

    0 讨论(0)
  • 2021-02-01 14:25

    As James stated in his answer, memory allocation is probably one cause why the LinkedList is slower.

    Additionally I believe the major difference originates from an invalid test. You are adding items to the beginning of the linked list, but to the end of the ordinary list. Wouldn't adding items to the beginning of the ordinary list shift the benchmarking results in favor of the LinkedList again?

    0 讨论(0)
  • 2021-02-01 14:32

    I highly recommend the article Number crunching: why you should never use a linked-list again. There isn't much there that isn't anywhere else, but I spent quite a bit of time trying to figure out why LinkedList<T> was so much slower than List<T> in situations I thought would obviously favor the linked list before I found it, and after looking it over, things made a bit more sense:

    The linked list has items in disjoint areas of memory, and as a result, one could say it is cache line hostile, because it maximizes cache misses. The disjoint memory makes traversing the list result in frequent and costly unexpected RAM lookups.

    A vector [equivalent to ArrayList or List<T>] on other hand has its items stored in adjacent memory, and in so doing, is able to maximize cache utilization and avoid cache misses. Often, in practice, this more than offsets the cost incurred when shuffling data around.

    If you'd like to hear that from a more authoritative source, this is from Tips for Improving Time-Critical Code on MSDN:

    Sometimes a data structure that looks great turns out to be horrible because of poor locality of reference. Here are two examples:

    • Dynamically allocated linked lists (LinkedListNode<T> is a reference type, so it is dynamically allocated) can reduce program performance because when you search for an item or when you traverse a list to the end, each skipped link could miss the cache or cause a page fault. A list implementation based on simple arrays might actually be much faster because of better caching and fewer page faults— even allowing for the fact that the array would be harder to grow, it still might be faster.

    • Hash tables that use dynamically allocated linked lists can degrade performance. By extension, hash tables that use dynamically allocated linked lists to store their contents might perform substantially worse. In fact, in the final analysis, a simple linear search through an array might actually be faster (depending on the circumstances). Array-based hash tables (IIRC, Dictionary<TKey,TValue> is array-based) are an often-overlooked implementation which frequently has superior performance.


    This is my original (far less useful) answer where I did some performance tests.

    The general consensus seems to be that the linked list is allocating memory on every add (because the node is a class) and that does seem to be the case. I tried to isolate the allocation code from the timed code that adds items to the list and made a gist from the result: https://gist.github.com/zeldafreak/d11ae7781f5d43206f65

    I run the test code 5 times and call GC.Collect() between them. Inserting 20 million nodes into the linked list takes 193-211ms (198ms) compared to 77-89ms (81ms), so even without the allocation, a standard list is a little over 2x faster. Iterating over a list takes 54-59ms, compared to 76-101ms for the linked list, which is a more modest 50%-ish faster.

    0 讨论(0)
  • 2021-02-01 14:40

    I've done the same test with List and LinkedList inserting actual objects (Annonymous Types, actually) into the list,and Linked List is slower than List in that case as well.

    However, LinkedList DOES speed up if you insert items like this, instead of using AddFirst, and AddLast:

    LinkedList<T> list = new LinkedList<T>();
    LinkedListNode<T> last = null;
    foreach(var x in aLotOfStuff)
    {
        if(last == null)
            last = list.AddFirst(x);
        else
            last = list.AddAfter(last, x);
    }
    

    AddAfter seems to be faster than AddLast. I would assume internally .NET would track the 'tail'/last object by ref, and go right to it when doing an AddLast(), but perhaps AddLast() causes it to traverse the entire list to the end?

    0 讨论(0)
提交回复
热议问题