问题
I always get a lot of help from stack overflows. Thank you all the time.
I am doing simple natural language processing using spacy
.
I'm working on filtering out words by measuring the similarity between words.
I wrote and used the following simple code shown in the spacy documentation, but the result does not look like a documentation.
import spacy
nlp = spacy.load('en_core_web_lg')
tokens = nlp('dog cat banana')
for token1 in tokens:
for token2 in tokens:
sim = token1.similarity(token2)
print("{:>6s}, {:>6s}: {}".format(token1.text, token2.text, sim))
the result of code is below.
dog, dog: 1.0
dog, cat: 2.307269867164827e-21
dog, banana: 0.0
cat, dog: 2.307269867164827e-21
cat, cat: 1.0
cat, banana: -0.04468117654323578
banana, dog: -7.828739256116838e+17
banana, cat: -8.242222286053048e+17
banana, banana: 1.0
Especially, similarity between "dog" and "cat" should be about 0.8, but it is not a nd very very small value.
In addition, similarity between "dog" and "banana" is 0.0 but similarity between 'banana' and 'dog' is -7.828739256116838e+17.
I don't know how to fix it.
please help me.
回答1:
First install large EN model (or all models).
python3 -m spacy.en.download all
Next, try with sample code as per documentation using,
nlp = spacy.load('en_core_web_md')
If that doesnt work, Instead of above try loading,
nlp = spacy.load('en')
After doing above changes the result is as per documentation.
python3 /tmp/c.py
dog, dog: 1.000000078333395
dog, cat: 0.8016855098942641
dog, banana: 0.2432764518408807
cat, dog: 0.8016855098942641
cat, cat: 1.0000001375986456
cat, banana: 0.2815436412709355
banana, dog: 0.2432764518408807
banana, cat: 0.2815436412709355
banana, banana: 1.000000107068369
回答2:
I solved this problem, finally.
just add code import numpy as np
.
That's all.
来源:https://stackoverflow.com/questions/52388291/spacy-similarity-method-doesnt-not-work-correctly