No module named graphframes Jupyter Notebook

北慕城南 提交于 2019-12-01 19:53:15

问题


I'm following this installation guide but have the following problem with using graphframes

from pyspark import SparkContext
sc =SparkContext()
!pyspark --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11
from graphframes import *

--------------------------------------------------------------------------- ImportError Traceback (most recent call last) in () ----> 1 from graphframes import *

ImportError: No module named graphframes

I'm not sure wether it is possible to install package on the following way. But I'll appreciate your advice and help.


回答1:


Good question!

Open up your bashrc file, and type export SPARK_OPTS="--packages graphframes:graphframes:0.5.0-spark2.1-s_2.11". Once you saved your bashrc file, close it and type source .bashrc.

Finally, open up your notebook and type:

from pyspark import SparkContext
sc = SparkContext()
sc.addPyFile('/home/username/spark-2.3.0-bin-hadoop2.7/jars/graphframes-0.5.0-spark2.1-s_2.11.jar')

After that, you may able to run it.




回答2:


I'm using jupyter notebook in docker, trying to get graphframes working. First, I used the method in https://stackoverflow.com/a/35762809/2202107, I have:

import findspark
findspark.init()
import pyspark
import os

SUBMIT_ARGS = "--packages graphframes:graphframes:0.7.0-spark2.4-s_2.11 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS

conf = pyspark.SparkConf()
sc = pyspark.SparkContext(conf=conf)
print(sc._conf.getAll())

Then by following this issue, we finally are able to import graphframes: https://github.com/graphframes/graphframes/issues/172

import sys
pyfiles = str(sc.getConf().get(u'spark.submit.pyFiles')).split(',')
sys.path.extend(pyfiles)
from graphframes import *



回答3:


The simplest way is to start jupyter with pyspark and graphframes is to start jupyter out from pyspark.

Just open your terminal and set the two environment variables and start pyspark with the graphframes package

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
pyspark --packages graphframes:graphframes:0.6.0-spark2.3-s_2.11

the advantage of this is also that if you later on want to run your code via spark-submit you can use the same start command



来源:https://stackoverflow.com/questions/50286139/no-module-named-graphframes-jupyter-notebook

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!