问题
Is it possible to install python packages in a Google Dataproc cluster after the cluster is created and running?
I tried to use "pip install xxxxxxx
" in the master command line but it does not seem to work.
Google's Dataproc documentation does not mention this situation.
回答1:
This is generally not possible after cluster is created. I recommend using an initialization action to do this.
As you've noticed, pip
is also not available by default. So you'll want to run easy_install pip
followed by pip install
command.
Finally, if your intention is to use this cluster in any automation, and/or you want hermeticness, I recommend creating a wheel
that you store in GCS and download in init action. You'd then install your wheel. Wheels have added benefit of being faster than installing many packages from pip directly.
2019 Update
See this tutorial on how to configure Python environment on Dataproc: https://cloud.google.com/dataproc/docs/tutorials/python-configuration
来源:https://stackoverflow.com/questions/50279905/how-to-install-python-packages-in-a-google-dataproc-cluster