I have some data that looks something like this:
ID1 ID2 ID3
ID1 ID4 ID5
ID3 ID5 ID7 ID6
...
...
where each row is a group.
M
TL;DR: Go with option 2. Just use sets from the start.
In Python, sets are hash-sets, and lists are dynamic arrays. Inserting is O(1)
for both, but checking if an element exists is O(n)
for the list and O(1)
for the set.
So option 1 is immediately out. If you are inserting n
items and need to check the list every time, then the overall complexity becomes O(n^2)
.
Options 2 and 3 are both optimal at O(n)
overall. Option 2 might be faster in micro-benchnarks because you don't need to move objects between collections. In practice, choose the option that is easier to read and maintain in your specific circumstance.