Is it possible to speed up Wordnet Lemmatizer?

半腔热情 提交于 2019-12-03 02:31:43

I've used the lemmatizer like this

    from nltk.stem.wordnet import WordNetLemmatizer #To download corpora: python -m    nltk.downloader all
    lmtzr=WordNetLemmatizer()#create a lemmatizer object
    lemma = lmtzr.lemmatize('cats')

It is not slow at all on my machine. There is no need to connect to the web to do this.

It doesn't query the internet, NLTK reads WordNet from your local machine. When you run the first query, NLTK loads WordNet from disk into memory:

>>> from time import time
>>> t=time(); lemmatize('dogs'); print time()-t, 'seconds'
u'dog'
3.38199806213 seconds
>>> t=time(); lemmatize('cats'); print time()-t, 'seconds'
u'cat'
0.000236034393311 seconds

It is rather slow if you have to lemmatize many thousands of phrases. However if you are doing a lot of redundant queries, you can get some speedup by caching the results of the function:

from nltk.stem import WordNetLemmatizer
from functools32 import lru_cache
wnl = WordNetLemmatizer()
lemmatize = lru_cache(maxsize=50000)(wnl.lemmatize)

lemmatize('dogs')
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!