converting spark dataframe to pandas dataframe - ImportError: Pandas >= 0.19.2 must be installed

强颜欢笑 提交于 2020-08-10 06:12:12

问题


I am trying to convert spark dataframe to pandas dataframe. I am trying to in Jupyter notebook on EMR. and I am trying following error.

Pandas library is installed on master node under my user. And using spark shell (pyspark) I am able to convert df to padnas df on that master node.

following command has been executed on all the master nodes

 pip --no-cache-dir install pandas --user

Following is working on master node. But not from pyspark notebook

import Pandas as pd

Error

No module named 'Pandas'
Traceback (most recent call last):
ModuleNotFoundError: No module named 'Pandas'

Update:

I can run following code from python notebook

import pandas as pd 
pd.DataFrame(["a", "b"], columns=['q_data']) 

回答1:


You need pandas on the driver node as when converting to pandas df all the data is collected to the driver and then converted



来源:https://stackoverflow.com/questions/62556754/converting-spark-dataframe-to-pandas-dataframe-importerror-pandas-0-19-2-m

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!