biglist =
[
{\'title\':\'U2 Band\',\'link\':\'u2.com\'},
{\'title\':\'ABC Station\',\'link\':\'abc.com\'},
{\'title\':\'Live Concert by U2\',\'link
Make a new dictionary, with 'u2.com' and 'abc.com' as the keys, and your list elements as the values. The dictionary will enforce uniqueness. Something like this:
uniquelist = dict((element['link'], element) for element in reversed(biglist))
(The reversed is there so that the first elements in the list will be the ones that remain in the dictionary. If you take that out, then you will get the last element instead).
Then you can get elements back into a list like this:
biglist = uniquelist.values()
biglist = \
[
{'title':'U2 Band','link':'u2.com'},
{'title':'ABC Station','link':'abc.com'},
{'title':'Live Concert by U2','link':'u2.com'}
]
def dedupe(lst):
d = {}
for x in lst:
link = x["link"]
if link in d:
continue
d[link] = x
return d.values()
lst = dedupe(biglist)
dedupe() keeps the first of any duplicates.
You can use defaultdict
to group items by link
, then removed duplicates if you want to.
from collections import defaultdict
nodupes = defaultdict(list)
for d in biglist:
nodupes[d['url']].append(d['title']
This will give you:
defaultdict(<type 'list'>, {'abc.com': ['ABC Station'], 'u2.com': ['U2 Band',
'Live Concert by U2']})
Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:
biglist = [
{'title':'U2 Band','link':'u2.com'},
{'title':'ABC Station','link':'abc.com'},
{'title':'Live Concert by U2','link':'u2.com'}
]
known_links = set()
newlist = []
for d in biglist:
link = d['link']
if link in known_links: continue
newlist.append(d)
known_links.add(link)
biglist[:] = newlist
You can sort the list, using the link
field of each dictionary as the sort key, then iterate through the list once and remove duplicates (or rather, create a new list with duplicates removed, as is the Python idiom), like so:
# sort the list using the 'link' item as the sort key
biglist.sort(key=lambda elt: elt['link'])
newbiglist = []
for item in biglist:
if newbiglist == [] or item['link'] != newbiglist[-1]['link']:
newbiglist.append(item)
This code will give you the first element (relative ordering in the original biglist
) for any group of "duplicates". This is true because the .sort()
algorithm used by Python is guaranteed to be a stable sort -- it does not change the order of elements determined to be equal to one another (in this case, elements with the same link
).