I have a list of lists in Python:
k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
And I want to remove duplicate elements from it. Was if it
Create a dictionary with tuple as the key, and print the keys.
k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
dict_tuple = {tuple(item): index for index, item in enumerate(k)}
print [list(itm) for itm in dict_tuple.keys()]
# prints [[1, 2], [5, 6, 2], [3], [4]]
Strangely, the answers above removes the 'duplicates' but what if I want to remove the duplicated value also?? The following should be useful and does not create a new object in memory!
def dictRemoveDuplicates(self):
a=[[1,'somevalue1'],[1,'somevalue2'],[2,'somevalue1'],[3,'somevalue4'],[5,'somevalue5'],[5,'somevalue1'],[5,'somevalue1'],[5,'somevalue8'],[6,'somevalue9'],[6,'somevalue0'],[6,'somevalue1'],[7,'somevalue7']]
print(a)
temp = 0
position = -1
for pageNo, item in a:
position+=1
if pageNo != temp:
temp = pageNo
continue
else:
a[position] = 0
a[position - 1] = 0
a = [x for x in a if x != 0]
print(a)
and the o/p is:
[[1, 'somevalue1'], [1, 'somevalue2'], [2, 'somevalue1'], [3, 'somevalue4'], [5, 'somevalue5'], [5, 'somevalue1'], [5, 'somevalue1'], [5, 'somevalue8'], [6, 'somevalue9'], [6, 'somevalue0'], [6, 'somevalue1'], [7, 'somevalue7']]
[[2, 'somevalue1'], [3, 'somevalue4'], [7, 'somevalue7']]
A bit of a background, I just started with python and learnt comprehensions.
k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
dedup = [elem.split('.') for elem in set(['.'.join(str(int_elem) for int_elem in _list) for _list in k])]
>>> k = [[1, 2], [4], [5, 6, 2], [1, 2], [3], [4]]
>>> k = sorted(k)
>>> k
[[1, 2], [1, 2], [3], [4], [4], [5, 6, 2]]
>>> dedup = [k[i] for i in range(len(k)) if i == 0 or k[i] != k[i-1]]
>>> dedup
[[1, 2], [3], [4], [5, 6, 2]]
I don't know if it's necessarily faster, but you don't have to use to tuples and sets.
Even your "long" list is pretty short. Also, did you choose them to match the actual data? Performance will vary with what these data actually look like. For example, you have a short list repeated over and over to make a longer list. This means that the quadratic solution is linear in your benchmarks, but not in reality.
For actually-large lists, the set code is your best bet—it's linear (although space-hungry). The sort and groupby methods are O(n log n) and the loop in method is obviously quadratic, so you know how these will scale as n gets really big. If this is the real size of the data you are analyzing, then who cares? It's tiny.
Incidentally, I'm seeing a noticeable speedup if I don't form an intermediate list to make the set, that is to say if I replace
kt = [tuple(i) for i in k]
skt = set(kt)
with
skt = set(tuple(i) for i in k)
The real solution may depend on more information: Are you sure that a list of lists is really the representation you need?
List of tuple and {} can be used to remove duplicates
>>> [list(tupl) for tupl in {tuple(item) for item in k }]
[[1, 2], [5, 6, 2], [3], [4]]
>>>