问题
I have been using PaperMill for executing my python notebook periodically. To execute compute intensive notebook, I need to connect to remote kernel running in my EMR cluster.
In case of Jupyter notebook I can do that by starting jupyter server with jupyter notebook --gateway-url=http://my-gateway-server:8888
and I am able to execute my code on remote kernel. But how do I let my local python code(through PaperMill) to use remote kernel? What changes do what to make in Kernel Manager to connect to remote kernel?
One related SO answer I could find is here. This suggests to do port forwarding to remote server and initialize KernelManager with the connection file from the server. I am not able to do this as blockingkernelmanager
is no longer in Ipython.zmp and I would also prefer HTTP connection like how jupyter does.
回答1:
Hacky approach - Set up a shell script to do the following :
- Create a python environment on your EMR masternode using the
hadoop
user - Install sparkmagic in your environment and configure all kernels as described in the README.md file for sparkmagic
- Copy your notebook to master node/use it directly from s3 location
Run with papermill :
papermill s3://path/to/notebook/input.ipynb s3://path/to/notebook/output.ipynb -p param=1
Step 1 and 2 are one time requirements if your cluster master node is the same every time.
A slightly better approach :
- Set up a remote kernel in your Jupyter itself : REMOTE KERNEL
- Execute with papermill as a normal notebook by selecting this remote kernel
I am using both approaches for different use cases and they seem to work fine for now.
回答2:
I have written this word document with concise steps to the setup. I hope you will find it useful. My method uses Jupyter notebook. You can ugnore the kite part. All you have to do is use hydrogen with your editor. Here it is
https://drive.google.com/file/d/1INVxvJVrnoj8z0iBqesYa1F1lbHgFSGu/view?usp=drivesdk
来源:https://stackoverflow.com/questions/59977601/connect-to-remote-python-kernel-from-python-code