How to remove these duplicates in a list (python)

前端 未结 5 2047
野的像风
野的像风 2021-01-03 15:26
biglist = 

[ 

    {\'title\':\'U2 Band\',\'link\':\'u2.com\'}, 
    {\'title\':\'ABC Station\',\'link\':\'abc.com\'}, 
    {\'title\':\'Live Concert by U2\',\'link         


        
相关标签:
5条回答
  • 2021-01-03 15:50

    Make a new dictionary, with 'u2.com' and 'abc.com' as the keys, and your list elements as the values. The dictionary will enforce uniqueness. Something like this:

    uniquelist = dict((element['link'], element) for element in reversed(biglist))
    

    (The reversed is there so that the first elements in the list will be the ones that remain in the dictionary. If you take that out, then you will get the last element instead).

    Then you can get elements back into a list like this:

    biglist = uniquelist.values()
    
    0 讨论(0)
  • 2021-01-03 15:55
    biglist = \
    [ 
        {'title':'U2 Band','link':'u2.com'}, 
        {'title':'ABC Station','link':'abc.com'}, 
        {'title':'Live Concert by U2','link':'u2.com'} 
    ]
    
    def dedupe(lst):
        d = {}
        for x in lst:
            link = x["link"]
            if link in d:
                continue
            d[link] = x
        return d.values()
    
    lst = dedupe(biglist)
    

    dedupe() keeps the first of any duplicates.

    0 讨论(0)
  • 2021-01-03 16:03

    You can use defaultdict to group items by link, then removed duplicates if you want to.

    from collections import defaultdict
    
    nodupes = defaultdict(list)
    for d in biglist:
        nodupes[d['url']].append(d['title']
    

    This will give you:

    defaultdict(<type 'list'>, {'abc.com': ['ABC Station'], 'u2.com': ['U2 Band', 
    'Live Concert by U2']})
    
    0 讨论(0)
  • 2021-01-03 16:11

    Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

    biglist = [ 
        {'title':'U2 Band','link':'u2.com'}, 
        {'title':'ABC Station','link':'abc.com'}, 
        {'title':'Live Concert by U2','link':'u2.com'} 
    ]
    
    known_links = set()
    newlist = []
    
    for d in biglist:
      link = d['link']
      if link in known_links: continue
      newlist.append(d)
      known_links.add(link)
    
    biglist[:] = newlist
    
    0 讨论(0)
  • 2021-01-03 16:12

    You can sort the list, using the link field of each dictionary as the sort key, then iterate through the list once and remove duplicates (or rather, create a new list with duplicates removed, as is the Python idiom), like so:

    # sort the list using the 'link' item as the sort key
    biglist.sort(key=lambda elt: elt['link'])
    
    newbiglist = []
    for item in biglist:
        if newbiglist == [] or item['link'] != newbiglist[-1]['link']:
            newbiglist.append(item)
    

    This code will give you the first element (relative ordering in the original biglist) for any group of "duplicates". This is true because the .sort() algorithm used by Python is guaranteed to be a stable sort -- it does not change the order of elements determined to be equal to one another (in this case, elements with the same link).

    0 讨论(0)
提交回复
热议问题