Python: Memory usage and optimization when modifying lists

前端 未结 7 1098
失恋的感觉
失恋的感觉 2021-02-04 03:28

The problem

My concern is the following: I am storing a relativity large dataset in a classical python list and in order to process the data I must iterate over the li

7条回答
  •  -上瘾入骨i
    2021-02-04 03:59

    A doubly linked list is worse than just reallocating the list. A Python list uses 5 words + one word per element. A doubly linked list will use 5 words per element. Even if you use a singly linked list, it's still going to be 4 words per element - a lot worse than the less than 2 words per element that rebuilding the list will take.

    From memory usage perspective, moving items up the list and deleting the slack at the end is the best approach. Python will release the memory if the list gets less than half full. The question to ask yourself is, does it really matter. The list entries probably point to some data, unless you have lots of duplicate objects in the list, the memory used for the list is insignificant compared to the data. Given that, you might just as well just build a new list.

    For building a new list, the approach you suggested is not that good. There's no apparent reason why you couldn't just go over the list once. Also, calling gc.collect() is unnecessary and actually harmful - the CPython reference counting will release the old list immediately anyway, and even the other garbage collectors are better off collecting when they hit memory pressure. So something like this will work:

    while processingdata:
        retained = []
        for item in somelist:
            dosomething(item)
            if not somecondition(item):
                retained.append(item)
        somelist = retained
    

    If you don't mind using side effects in list comprehensions, then the following is also an option:

    def process_and_decide(item):
        dosomething(item)
        return not somecondition(item)
    
    while processingdata:
        somelist = [item for item in somelist if process_and_decide(item)]
    

    The inplace method can also be refactored so the mechanism and business logic are separated:

    def inplace_filter(func, list_):
        pos = 0
        for item in list_:
            if func(item):
                list_[pos] = item
                pos += 1
        del list_[pos:]
    
    while processingdata:
        inplace_filter(process_and_decide, somelist)
    

提交回复
热议问题