The similar method from the nltk module produces different results on different machines. Why?

后端 未结 2 996
执念已碎
执念已碎 2021-01-07 23:49

I have taught a few introductory classes to text mining with Python, and the class tried the similar method with the provided practice texts. Some students got different re

2条回答
  •  被撕碎了的回忆
    2021-01-08 00:16

    In your example there are 40 other words which have exactly one context in common with the word 'monstrous'. In the similar function a Counter object is used to count the words with similar contexts and then the most common ones (default 20) are printed. Since all 40 have the same frequency the order can differ.

    From the doc of Counter.most_common:

    Elements with equal counts are ordered arbitrarily


    I checked the frequency of the similar words with this code (which is essentially a copy of the relevant part of the function code):

    from nltk.book import *
    from nltk.util import tokenwrap
    from nltk.compat import Counter
    
    word = 'monstrous'
    num = 20
    
    text1.similar(word)
    
    wci = text1._word_context_index._word_to_contexts
    
    if word in wci.conditions():
                contexts = set(wci[word])
                fd = Counter(w for w in wci.conditions() for c in wci[w]
                              if c in contexts and not w == word)
                words = [w for w, _ in fd.most_common(num)]
                # print(tokenwrap(words))
    
    print(fd)
    print(len(fd))
    print(fd.most_common(num))
    

    Output: (different runs give different output for me)

    Counter({'doleful': 1, 'curious': 1, 'delightfully': 1, 'careful': 1, 'uncommon': 1, 'mean': 1, 'perilous': 1, 'fearless': 1, 'imperial': 1, 'christian': 1, 'trustworthy': 1, 'untoward': 1, 'maddens': 1, 'true': 1, 'contemptible': 1, 'subtly': 1, 'wise': 1, 'lamentable': 1, 'tyrannical': 1, 'puzzled': 1, 'vexatious': 1, 'part': 1, 'gamesome': 1, 'determined': 1, 'reliable': 1, 'lazy': 1, 'passing': 1, 'modifies': 1, 'few': 1, 'horrible': 1, 'candid': 1, 'exasperate': 1, 'pitiable': 1, 'abundant': 1, 'mystifying': 1, 'mouldy': 1, 'loving': 1, 'domineering': 1, 'impalpable': 1, 'singular': 1})
    

提交回复
热议问题