Remove duplicates from a list of dictionaries when only one of the key values is different

前端 未结 5 1096
误落风尘
误落风尘 2020-12-12 02:37

I have seen some similar answers, but I can\'t find something specific for this case:

I have a list of dictionaries like this:

[
 {\"element\":Bla, \         


        
相关标签:
5条回答
  • 2020-12-12 03:06

    Apologies for terrible variable names. There is probably a cleaner way but this should work

    seen = {(item["element"], item["version"]): False for item in mylist}
    
    output = []
    for item in mylist:
        item_key = (item["element"], item["version"])
        if not seen[item_key]:
            output.append(item)
            seen[item_key] = True
    
    0 讨论(0)
  • 2020-12-12 03:16

    Pandas can solve this quickly:

    import pandas as pd
    Bla = "Bla"
    d = [
    {"element":Bla, "version":2, "date":"12/04/12"},
    {"element":Bla, "version":2, "date":"12/05/12"},
    {"element":Bla, "version":3, "date":"12/04/12"}
    ]
    df = pd.DataFrame(d)
    df[~df.drop("date", axis=1).duplicated()]
    

    output:

           date element  version
    0  12/04/12     Bla        2
    2  12/04/12     Bla        3
    
    0 讨论(0)
  • 2020-12-12 03:22

    You say you have a lot of other keys in the dictionary not mentioned in the question.

    Here is O(n) algorithm to do what you need:

    >>> seen = set()
    >>> result = []
    >>> for d in dicts:
    ...     h = d.copy()
    ...     h.pop('date')
    ...     h = tuple(h.items())
    ...     if h not in seen:
    ...         result.append(d)
    ...         seen.add(h)
    
    >>> pprint(result)
    [{'date': '12/04/12', 'element': 'Bla', 'version': 2},
     {'date': '12/04/12', 'element': 'Bla', 'version': 3}]
    

    h is a copy of the dict. date key is removed from it with pop.

    Then tuple is created as a hashable type which can be added to set.

    If h has never been seen before, we append it to result and add to seen. Additions to seen is O(1) as well as lookups (h not in seen).

    At the end, result contains only unique elements in terms of defined h values.

    0 讨论(0)
  • 2020-12-12 03:22

    You could use the "unique_everseen" recipe from itertools to create a new list.

    list(unique_everseen(original_list, key=lambda e: '{element}@{version}'.format(**e)))
    

    If your "key" needs to be wider than the lambda I have written (to accomodate more values), then it's probably worth extracting to a function:

    def key_without_date(element):
        return '@'.join(["{}".format(v) for k,v in element.iteritems() if k != 'date'])
    
    list(unique_everseen(original_list, key=key_without_date))
    
    0 讨论(0)
  • 2020-12-12 03:30

    This works:

    LoD=[
    {"element":'Bla', "version":2, 'list':[1,2,3], "date":"12/04/12"},
    {"element":'Bla', "version":2, 'list':[1,2,3], "date":"12/05/12"},
    {"element":'Bla', "version":3, 'list':[1,2,3], "date":"12/04/12"}
    ]
    
    LoDcopy=[]
    seen=set()
    
    
    for d in LoD:
        dc=d.copy()
        del dc['date']
        s=dc.__str__()
        if s in seen: continue
        seen.add(s)
        LoDcopy.append(d)    
    
    print LoDcopy 
    

    prints:

    [{'date': '12/04/12', 'version': 2, 'list': [1, 2, 3], 'element': 'Bla'}, 
     {'date': '12/04/12', 'version': 3, 'list': [1, 2, 3], 'element': 'Bla'}]
    
    0 讨论(0)
提交回复
热议问题