I have taught a few introductory classes to text mining with Python, and the class tried the similar method with the provided practice texts. Some students got different re
In your example there are 40 other words which have exactly one context in common with the word 'monstrous'
.
In the similar function a Counter
object is used to count the words with similar contexts and then the most common ones (default 20) are printed. Since all 40 have the same frequency the order can differ.
From the doc of Counter.most_common
:
Elements with equal counts are ordered arbitrarily
I checked the frequency of the similar words with this code (which is essentially a copy of the relevant part of the function code):
from nltk.book import *
from nltk.util import tokenwrap
from nltk.compat import Counter
word = 'monstrous'
num = 20
text1.similar(word)
wci = text1._word_context_index._word_to_contexts
if word in wci.conditions():
contexts = set(wci[word])
fd = Counter(w for w in wci.conditions() for c in wci[w]
if c in contexts and not w == word)
words = [w for w, _ in fd.most_common(num)]
# print(tokenwrap(words))
print(fd)
print(len(fd))
print(fd.most_common(num))
Output: (different runs give different output for me)
Counter({'doleful': 1, 'curious': 1, 'delightfully': 1, 'careful': 1, 'uncommon': 1, 'mean': 1, 'perilous': 1, 'fearless': 1, 'imperial': 1, 'christian': 1, 'trustworthy': 1, 'untoward': 1, 'maddens': 1, 'true': 1, 'contemptible': 1, 'subtly': 1, 'wise': 1, 'lamentable': 1, 'tyrannical': 1, 'puzzled': 1, 'vexatious': 1, 'part': 1, 'gamesome': 1, 'determined': 1, 'reliable': 1, 'lazy': 1, 'passing': 1, 'modifies': 1, 'few': 1, 'horrible': 1, 'candid': 1, 'exasperate': 1, 'pitiable': 1, 'abundant': 1, 'mystifying': 1, 'mouldy': 1, 'loving': 1, 'domineering': 1, 'impalpable': 1, 'singular': 1})