wordnet

How to Normalize similarity measures from Wordnet

拟墨画扇 提交于 2020-01-31 05:29:05
问题 I am trying to calculate semantic similarity between two words. I am using Wordnet-based similarity measures i.e Resnik measure(RES), Lin measure(LIN), Jiang and Conrath measure(JNC) and Banerjee and Pederson measure(BNP). To do that, I am using nltk and Wordnet 3.0. Next, I want to combine the similarity values obtained from different measure. To do that i need to normalize the similarity values as some measure give values between 0 and 1, while others give values greater than 1. So, my

How to convert token list into wordnet lemma list using nltk?

淺唱寂寞╮ 提交于 2020-01-17 15:02:16
问题 I have a list of tokens extracted out of a pdf source. I am able to pre process the text and tokenize it but I want to loop through the tokens and convert each token in the list to its lemma in the wordnet corpus. So, my tokens list looks like this: ['0000', 'Everyone', 'age', 'remembers', 'Þ', 'rst', 'heard', 'contest', 'I', 'sitting', 'hideout', 'watching', ...] There's no lemmas of words like 'Everyone', '0000', 'Þ' and many more which I need to eliminate. But for words like 'age',

How to convert token list into wordnet lemma list using nltk?

偶尔善良 提交于 2020-01-17 15:01:53
问题 I have a list of tokens extracted out of a pdf source. I am able to pre process the text and tokenize it but I want to loop through the tokens and convert each token in the list to its lemma in the wordnet corpus. So, my tokens list looks like this: ['0000', 'Everyone', 'age', 'remembers', 'Þ', 'rst', 'heard', 'contest', 'I', 'sitting', 'hideout', 'watching', ...] There's no lemmas of words like 'Everyone', '0000', 'Þ' and many more which I need to eliminate. But for words like 'age',

NLTK Wordnet Download Out of Date

佐手、 提交于 2020-01-17 03:44:12
问题 New to Python, tying to get started with NLTK. After a rough time installing Python on my Windows 7 64-bit system, I am now having a rough time downloading Wordnet and other NLTK data packages located here: http://nltk.org/nltk_data/ Some packages download, some say "Out of Date" import nltk nltk.download() When I use the above to download, the program doesn't let me cancel if I hit the cancel button. So, I just shut it down and go directly to the link above to try and download it manually.

Implicit parameter and function

谁说胖子不能爱 提交于 2020-01-14 07:27:47
问题 I have a problem considering implicit parameters in Haskell (GHC). I have a function f , that assumes the implicit parameter x , and would like to encapsulate it in a context by applying f to g f :: (?x :: Int) => Int -> Int f n = n + ?x g :: (Int -> Int) -> (Int -> Int) g t = let ?x = 5 in t But when i try to evaluate g f 10 I get an error that x is not bound, e.g.: Unbound implicit parameter (?x::Int) arising from a use of `f' In the first argument of `g', namely `f' In the second argument

NLTK Wordnet Synset for word phrase

房东的猫 提交于 2020-01-13 10:08:20
问题 I'm working with the Python NLTK Wordnet API. I'm trying to find the best synset that represents a group of words. If I need to find the best synset for something like "school & office supplies", I'm not sure how to go about this. So far I've tried finding the synsets for the individual words and then computing the best lowest common hypernym like this: def find_best_synset(category_name): text = word_tokenize(category_name) tags = pos_tag(text) node_synsets = [] for word, tag in tags: pos =

NLTK Wordnet Synset for word phrase

蹲街弑〆低调 提交于 2020-01-13 10:06:23
问题 I'm working with the Python NLTK Wordnet API. I'm trying to find the best synset that represents a group of words. If I need to find the best synset for something like "school & office supplies", I'm not sure how to go about this. So far I've tried finding the synsets for the individual words and then computing the best lowest common hypernym like this: def find_best_synset(category_name): text = word_tokenize(category_name) tags = pos_tag(text) node_synsets = [] for word, tag in tags: pos =

Python NLTK Lemmatization of the word 'further' with wordnet

旧时模样 提交于 2020-01-09 05:33:50
问题 I'm working on a lemmatizer using python, NLTK and the WordNetLemmatizer. Here is a random text that output what I was expecting from nltk.stem import WordNetLemmatizer from nltk.corpus import wordnet lem = WordNetLemmatizer() lem.lemmatize('worse', pos=wordnet.ADJ) // here, we are specifying that 'worse' is an adjective Output: 'bad' lem.lemmatize('worse', pos=wordnet.ADV) // here, we are specifying that 'worse' is an adverb Output: 'worse' Well, everything here is fine. The behaviour is the

Getting the closest noun from a stemmed word

限于喜欢 提交于 2020-01-05 10:09:46
问题 Short version: If I have a stemmed word: Say 'comput' for 'computing', or 'sugari' for 'sugary' Is there a way to construct it's closest noun form? That is 'computer', or 'sugar' respectively Longer version: I'm using python and NLTK, Wordnet to perform a few semantic similarity tasks on a bunch of words. I noticed that most sem-sim scores work well only for nouns, while adjectives and verbs don't give any results. Understanding the inaccuracies involved, I wanted to convert a word from its

Getting the closest noun from a stemmed word

牧云@^-^@ 提交于 2020-01-05 10:09:10
问题 Short version: If I have a stemmed word: Say 'comput' for 'computing', or 'sugari' for 'sugary' Is there a way to construct it's closest noun form? That is 'computer', or 'sugar' respectively Longer version: I'm using python and NLTK, Wordnet to perform a few semantic similarity tasks on a bunch of words. I noticed that most sem-sim scores work well only for nouns, while adjectives and verbs don't give any results. Understanding the inaccuracies involved, I wanted to convert a word from its