I use NLTK with wordnet in my project. I did the installation manually on my PC, with pip:
pip3 install nltk --user
in a terminal, then nltk.download()
in a python shell to download wordnet.
I want to automatize these with a setup.py
file, but I don't know a good way to install wordnet.
For the moment, I have this piece of code after the call to setup
("nltk"
is in the install_requires
list of the call to setup
):
import sys
if 'install' in sys.argv:
import nltk
nltk.download("wordnet")
Is there a better way to do this?
I managed to install the NLTK data in setup.py by overriding cmdclass
with my own Install
class :
from setuptools import setup, find_packages
from setuptools.command.install import install as _install
class Install(_install):
def run(self):
_install.do_egg_install(self)
import nltk
nltk.download("popular")
setup(...
cmdclass={'install': Install},
...
install_requires=[
'nltk',
],
setup_requires=['nltk']
...
)
It is important to use the method do_egg_install()
in your run()
method to make sure nltk gets installed, before import nltk
is called (See also here python setuptools install_requires is ignored when overriding cmdclass). Also don't forget to add nltk
to setup_requires
.
You can also automate installation with a shell script, for example, running (after pip installing nltk):
python -m nltk.downloader -d /usr/share/nltk_data wordnet
来源:https://stackoverflow.com/questions/26799894/installing-nltk-data-dependencies-in-setup-py-script