converting spark dataframe to pandas dataframe - ImportError: Pandas >= 0.19.2 must be installed

I am trying to convert spark dataframe to pandas dataframe. I am trying to in Jupyter notebook on EMR. and I am trying following error.

Pandas library is installed on master node under my user. And using spark shell (pyspark) I am able to convert df to padnas df on that master node.

following command has been executed on all the master nodes

 pip --no-cache-dir install pandas --user

Following is working on master node. But not from pyspark notebook

import Pandas as pd

Error

No module named 'Pandas'
Traceback (most recent call last):
ModuleNotFoundError: No module named 'Pandas'

Update:

I can run following code from python notebook

import pandas as pd 
pd.DataFrame(["a", "b"], columns=['q_data'])

You need pandas on the driver node as when converting to pandas df all the data is collected to the driver and then converted

来源：https://stackoverflow.com/questions/62556754/converting-spark-dataframe-to-pandas-dataframe-importerror-pandas-0-19-2-m

标签

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!