Python: Memory usage and optimization when modifying lists

前端 未结 7 1103
失恋的感觉
失恋的感觉 2021-02-04 03:28

The problem

My concern is the following: I am storing a relativity large dataset in a classical python list and in order to process the data I must iterate over the li

7条回答
  •  旧时难觅i
    2021-02-04 03:56

    You do not provide enough information I can find to answer this question really well. I don't know your use case well enough to tell you what data structures will get you the time complexities you want if you have to optimize for time. The typical solution is to build a new list rather than repeated deletions, but obviously this doubles(ish) memory usage.

    If you have memory usage issues, you might want to abandon using in-memory Python constructs and go with an on-disk database. Many databases are available and sqlite ships with Python. Depending on your usage and how tight your memory requirements are, array.array or numpy might help you, but this is highly dependent on what you need to do. array.array will have all the same time complexities as list and numpy arrays sort of will but work in some different ways. Using lazy iterators (like generators and the stuff in the itertools module) can often reduce memory usage by a factor of n.

    Using a database will improve time to delete items from arbitrary locations (though order will be lost if this is important). Using a dict will do the same, but potentially at high memory usage.

    You can also consider blist as a drop-in replacement for a list that might get some of the compromises you want. I don't believe it will drastically increase memory usage, but it will change item removal to O(log n). This comes at the cost of making other operations more expensive, of course.

    I would have to see testing to believe that the constant factor for memory use for your doubly linked list implementation would be less than the 2 that you get by simply creating a new list. I really doubt it.

    You will have to share more about your problem class for a more concrete answer, I think, but the general advice is

    • Iterate over a list building a new list as you go along (or using a generator to yield the items when you need them). If you actually need a list, this will have a memory factor of 2, which scales fine but doesn't help if you are short on memory period.
    • If you are running out of memory, rather than microoptimization you probably want an on-disk database or to store your data in a file.

提交回复
热议问题