Inverting a dictionary when some of the original values are identical

后端 未结 5 806
北海茫月
北海茫月 2021-01-19 23:57

Say I have a dictionary called word_counter_dictionary that counts how many words are in the document in the form {\'word\' : number}. For example,

相关标签:
5条回答
  • 2021-01-20 00:41

    For getting the largest elements of some dataset an inverted dictionary might not be the best data structure.

    Either put the items in a sorted list (example assumes you want to get to two most frequent words):

    word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
    counter_word_list = sorted((count, word) for word, count in word_counter_dictionary.items())
    

    Result:

    >>> print(counter_word_list[-2:])
    [(2, 'second'), (3, 'third')]
    

    Or use Python's included batteries (heapq.nlargest in this case):

    import heapq, operator
    print(heapq.nlargest(2, word_counter_dictionary.items(), key=operator.itemgetter(1)))
    

    Result:

    [('third', 3), ('second', 2)]
    
    0 讨论(0)
  • 2021-01-20 00:50

    What you can do is convert the value in a list of words with the same key:

    word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
    
    inverted_dictionary = {}
    for key in word_counter_dictionary:
        new_key = word_counter_dictionary[key]
        if new_key in inverted_dictionary:
            inverted_dictionary[new_key].append(str(key))
        else:
            inverted_dictionary[new_key] = [str(key)]
    
    print inverted_dictionary
    
    >>> {1: ['first'], 2: ['second', 'fourth'], 3: ['third']}
    
    0 讨论(0)
  • 2021-01-20 00:52

    Here's a version that doesn't "invert" the dictionary:

    >>> import operator
    >>> A = {'a':10, 'b':843, 'c': 39, 'd': 10}
    >>> B = sorted(A.iteritems(), key=operator.itemgetter(1), reverse=True)
    >>> B
    [('b', 843), ('c', 39), ('a', 10), ('d', 10)]
    

    Instead, it creates a list that is sorted, highest to lowest, by value.

    To get the top 25, you simply slice it: B[:25].

    And here's one way to get the keys and values separated (after putting them into a list of tuples):

    >>> [x[0] for x in B]
    ['b', 'c', 'a', 'd']
    >>> [x[1] for x in B]
    [843, 39, 10, 10]
    

    or

    >>> C, D = zip(*B)
    >>> C
    ('b', 'c', 'a', 'd')
    >>> D
    (843, 39, 10, 10)
    

    Note that if you only want to extract the keys or the values (and not both) you should have done so earlier. This is just examples of how to handle the tuple list.

    0 讨论(0)
  • 2021-01-20 00:53

    Python dicts do NOT allow repeated keys, so you can't use a simple dictionary to store multiple elements with the same key (1 in your case). For your example, I'd rather have a list as the value of your inverted dictionary, and store in that list the words that share the number of appearances, like:

    inverted_dictionary = {}
    for key in word_counter_dictionary:
        new_key = word_counter_dictionary[key]
        if new_key in inverted_dictionary:
            inverted_dictionary[new_key].append(key)
        else:
            inverted_dictionary[new_key] = [key]
    

    In order to get the 25 most repeated words, you should iterate through the (sorted) keys in the inverted_dictionary and store the words:

    common_words = []
    for key in sorted(inverted_dictionary.keys(), reverse=True):
        if len(common_words) < 25:
            common_words.extend(inverted_dictionary[key])
        else: 
            break
    
    common_words = common_words[:25] # In case there are more than 25 words
    
    0 讨论(0)
  • 2021-01-20 00:54

    A defaultdict is perfect for this

    word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
    from collections import defaultdict
    
    d = defaultdict(list)
    for key, value in word_counter_dictionary.iteritems():
        d[value].append(key)
    
    print(d)
    

    Output:

    defaultdict(<type 'list'>, {1: ['first'], 2: ['second', 'fourth'], 3: ['third']})
    
    0 讨论(0)
提交回复
热议问题