The similar method from the nltk module produces different results on different machines. Why?

后端未结

关注

 2  996

执念已碎 2021-01-07 23:49

I have taught a few introductory classes to text mining with Python, and the class tried the similar method with the provided practice texts. Some students got different re

2条回答

被撕碎了的回忆 (楼主)

2021-01-08 00:16

In your example there are 40 other words which have exactly one context in common with the word 'monstrous'. In the similar function a Counter object is used to count the words with similar contexts and then the most common ones (default 20) are printed. Since all 40 have the same frequency the order can differ.

From the doc of Counter.most_common:

Elements with equal counts are ordered arbitrarily

I checked the frequency of the similar words with this code (which is essentially a copy of the relevant part of the function code):

from nltk.book import *
from nltk.util import tokenwrap
from nltk.compat import Counter

word = 'monstrous'
num = 20

text1.similar(word)

wci = text1._word_context_index._word_to_contexts

if word in wci.conditions():
            contexts = set(wci[word])
            fd = Counter(w for w in wci.conditions() for c in wci[w]
                          if c in contexts and not w == word)
            words = [w for w, _ in fd.most_common(num)]
            # print(tokenwrap(words))

print(fd)
print(len(fd))
print(fd.most_common(num))

Output: (different runs give different output for me)

Counter({'doleful': 1, 'curious': 1, 'delightfully': 1, 'careful': 1, 'uncommon': 1, 'mean': 1, 'perilous': 1, 'fearless': 1, 'imperial': 1, 'christian': 1, 'trustworthy': 1, 'untoward': 1, 'maddens': 1, 'true': 1, 'contemptible': 1, 'subtly': 1, 'wise': 1, 'lamentable': 1, 'tyrannical': 1, 'puzzled': 1, 'vexatious': 1, 'part': 1, 'gamesome': 1, 'determined': 1, 'reliable': 1, 'lazy': 1, 'passing': 1, 'modifies': 1, 'few': 1, 'horrible': 1, 'candid': 1, 'exasperate': 1, 'pitiable': 1, 'abundant': 1, 'mystifying': 1, 'mouldy': 1, 'loving': 1, 'domineering': 1, 'impalpable': 1, 'singular': 1})

0 讨论(0)

查看其它2个回答