How to install python packages in a Google Dataproc cluster

廉价感情. 提交于 2020-05-28 23:20:14

问题


Is it possible to install python packages in a Google Dataproc cluster after the cluster is created and running?

I tried to use "pip install xxxxxxx" in the master command line but it does not seem to work.

Google's Dataproc documentation does not mention this situation.


回答1:


This is generally not possible after cluster is created. I recommend using an initialization action to do this.

As you've noticed, pip is also not available by default. So you'll want to run easy_install pip followed by pip install command.

Finally, if your intention is to use this cluster in any automation, and/or you want hermeticness, I recommend creating a wheel that you store in GCS and download in init action. You'd then install your wheel. Wheels have added benefit of being faster than installing many packages from pip directly.

2019 Update

See this tutorial on how to configure Python environment on Dataproc: https://cloud.google.com/dataproc/docs/tutorials/python-configuration



来源:https://stackoverflow.com/questions/50279905/how-to-install-python-packages-in-a-google-dataproc-cluster

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!