My concern is the following: I am storing a relativity large dataset in a classical python list and in order to process the data I must iterate over the li
You do not provide enough information I can find to answer this question really well. I don't know your use case well enough to tell you what data structures will get you the time complexities you want if you have to optimize for time. The typical solution is to build a new list rather than repeated deletions, but obviously this doubles(ish) memory usage.
If you have memory usage issues, you might want to abandon using in-memory Python constructs and go with an on-disk database. Many databases are available and sqlite ships with Python. Depending on your usage and how tight your memory requirements are, array.array
or numpy might help you, but this is highly dependent on what you need to do. array.array
will have all the same time complexities as list
and numpy arrays sort of will but work in some different ways. Using lazy iterators (like generators and the stuff in the itertools
module) can often reduce memory usage by a factor of n.
Using a database will improve time to delete items from arbitrary locations (though order will be lost if this is important). Using a dict
will do the same, but potentially at high memory usage.
You can also consider blist as a drop-in replacement for a list that might get some of the compromises you want. I don't believe it will drastically increase memory usage, but it will change item removal to O(log n). This comes at the cost of making other operations more expensive, of course.
I would have to see testing to believe that the constant factor for memory use for your doubly linked list implementation would be less than the 2 that you get by simply creating a new list. I really doubt it.
You will have to share more about your problem class for a more concrete answer, I think, but the general advice is