Inverting a dictionary when some of the original values are identical

后端未结

关注

 5  806

Say I have a dictionary called word_counter_dictionary that counts how many words are in the document in the form {\'word\' : number}. For example,

相关标签:

5条回答

遇见更好的自我

2021-01-20 00:41

For getting the largest elements of some dataset an inverted dictionary might not be the best data structure.

Either put the items in a sorted list (example assumes you want to get to two most frequent words):

word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
counter_word_list = sorted((count, word) for word, count in word_counter_dictionary.items())

Result:

>>> print(counter_word_list[-2:])
[(2, 'second'), (3, 'third')]

Or use Python's included batteries (heapq.nlargest in this case):

import heapq, operator
print(heapq.nlargest(2, word_counter_dictionary.items(), key=operator.itemgetter(1)))

Result:

[('third', 3), ('second', 2)]

0 讨论(0)

北荒

2021-01-20 00:50

What you can do is convert the value in a list of words with the same key:

word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}

inverted_dictionary = {}
for key in word_counter_dictionary:
    new_key = word_counter_dictionary[key]
    if new_key in inverted_dictionary:
        inverted_dictionary[new_key].append(str(key))
    else:
        inverted_dictionary[new_key] = [str(key)]

print inverted_dictionary

>>> {1: ['first'], 2: ['second', 'fourth'], 3: ['third']}

0 讨论(0)

深忆病人

2021-01-20 00:52
Here's a version that doesn't "invert" the dictionary:
```
>>> import operator
>>> A = {'a':10, 'b':843, 'c': 39, 'd': 10}
>>> B = sorted(A.iteritems(), key=operator.itemgetter(1), reverse=True)
>>> B
[('b', 843), ('c', 39), ('a', 10), ('d', 10)]
```
Instead, it creates a list that is sorted, highest to lowest, by value.

To get the top 25, you simply slice it: B[:25].

And here's one way to get the keys and values separated (after putting them into a list of tuples):
```
>>> [x[0] for x in B]
['b', 'c', 'a', 'd']
>>> [x[1] for x in B]
[843, 39, 10, 10]
```
or
```
>>> C, D = zip(*B)
>>> C
('b', 'c', 'a', 'd')
>>> D
(843, 39, 10, 10)
```
Note that if you only want to extract the keys or the values (and not both) you should have done so earlier. This is just examples of how to handle the tuple list.
0 讨论(0)
发布评论:

提交评论
- 加载中...

攒了一身酷

2021-01-20 00:53

Python dicts do NOT allow repeated keys, so you can't use a simple dictionary to store multiple elements with the same key (1 in your case). For your example, I'd rather have a list as the value of your inverted dictionary, and store in that list the words that share the number of appearances, like:

inverted_dictionary = {}
for key in word_counter_dictionary:
    new_key = word_counter_dictionary[key]
    if new_key in inverted_dictionary:
        inverted_dictionary[new_key].append(key)
    else:
        inverted_dictionary[new_key] = [key]

In order to get the 25 most repeated words, you should iterate through the (sorted) keys in the inverted_dictionary and store the words:

common_words = []
for key in sorted(inverted_dictionary.keys(), reverse=True):
    if len(common_words) < 25:
        common_words.extend(inverted_dictionary[key])
    else: 
        break

common_words = common_words[:25] # In case there are more than 25 words

0 讨论(0)

渐次进展

2021-01-20 00:54

A defaultdict is perfect for this

word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
from collections import defaultdict

d = defaultdict(list)
for key, value in word_counter_dictionary.iteritems():
    d[value].append(key)

print(d)

Output:

defaultdict(<type 'list'>, {1: ['first'], 2: ['second', 'fourth'], 3: ['third']})

0 讨论(0)