Repeatedly appending to a large list (Python 2.6.6)

前端 未结 5 600
小蘑菇
小蘑菇 2020-12-29 05:30

I have a project where I am reading in ASCII values from a microcontroller through a serial port (looks like this : AA FF BA 11 43 CF etc) The input is coming in quickly (38

相关标签:
5条回答
  • 2020-12-29 06:09

    I'm given to understand that the larger a list becomes, the slower list operations become.

    That's not true in general. Lists in Python are, despite the name, not linked lists but arrays. There are operations that are O(n) on arrays (copying and searching, for instance), but you don't seem to use any of these. As a rule of thumb: If it's widely used and idiomatic, some smart people went and chose a smart way to do it. list.append is a widely-used builtin (and the underlying C function is also used in other places, e.g. list comprehensions). If there was a faster way, it would already be in use.

    As you will see when you inspect the source code, lists are overallocating, i.e. when they are resized, they allocate more than needed for one item so the next n items can be appended without need to another resize (which is O(n)). The growth isn't constant, it is proportional with the list size, so resizing becomes rarer as the list grows larger. Here's the snippet from listobject.c:list_resize that determines the overallocation:

    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
     */
    new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
    

    As Mark Ransom points out, older Python versions (<2.7, 3.0) have a bug that make the GC sabotage this. If you have such a Python version, you may want to disable the gc. If you can't because you generate too much garbage (that slips refcounting), you're out of luck though.

    0 讨论(0)
  • 2020-12-29 06:10

    It might be faster to use numpy if you know how long the array is going to be and you can convert your hex codes to ints:

    import numpy
    a = numpy.zeros(3000000, numpy.int32)
    for i in range(3000000):
       a[i] = int(scanHexFromSerial(),16)
    

    This will leave you with an array of integers (which you could convert back to hex with hex()), but depending on your application maybe that will work just as well for you.

    0 讨论(0)
  • 2020-12-29 06:13

    First of all, 38 two-character sets per second, 1 stop bit, 8 data bits, and no parity, is only 760 baud, not fast at all.

    But anyway, my suggestion, if you're worried about having overly large lists/don't want to use one huge list, is just to store store a list on disk once it reaches a certain size and start a new list, repeating until you've gotten all the data, then combining all the lists into one once you're done receiving the data.

    Though you may skip the sublists completely and just go with nmichaels' suggestion, writing the data to a file as you get it and using a small circular buffer to hold the received data that has not yet been written.

    0 讨论(0)
  • 2020-12-29 06:22

    One thing you might want to consider is writing your data to a file as it's collected. I don't know (or really care) if it will affect performance, but it will help ensure that you don't lose all your data if power blips. Once you've got all the data, you can suck it out of the file and jam it in a list or an array or a numpy matrix or whatever for processing.

    0 讨论(0)
  • 2020-12-29 06:23

    Appending to a python list has a constant cost. It is not affected by the number of items in the list (in theory). In practice appending to a list will get slower once you run out of memory and the system starts swapping.

    http://wiki.python.org/moin/TimeComplexity

    It would be helpful to understand why you actually append things into a list. What are you planning to do with the items. If you don't need all of them you could build a ring buffer, if you don't need to do computation you could write the list to a file, etc.

    0 讨论(0)
提交回复
热议问题