问题
I'm following this installation guide but have the following problem with using graphframes
from pyspark import SparkContext
sc =SparkContext()
!pyspark --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11
from graphframes import *
--------------------------------------------------------------------------- ImportError Traceback (most recent call last) in () ----> 1 from graphframes import *
ImportError: No module named graphframes
I'm not sure wether it is possible to install package on the following way. But I'll appreciate your advice and help.
回答1:
Good question!
Open up your bashrc file, and type export SPARK_OPTS="--packages graphframes:graphframes:0.5.0-spark2.1-s_2.11"
. Once you saved your bashrc file, close it and type source .bashrc
.
Finally, open up your notebook and type:
from pyspark import SparkContext
sc = SparkContext()
sc.addPyFile('/home/username/spark-2.3.0-bin-hadoop2.7/jars/graphframes-0.5.0-spark2.1-s_2.11.jar')
After that, you may able to run it.
回答2:
I'm using jupyter notebook in docker, trying to get graphframes working. First, I used the method in https://stackoverflow.com/a/35762809/2202107, I have:
import findspark
findspark.init()
import pyspark
import os
SUBMIT_ARGS = "--packages graphframes:graphframes:0.7.0-spark2.4-s_2.11 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
conf = pyspark.SparkConf()
sc = pyspark.SparkContext(conf=conf)
print(sc._conf.getAll())
Then by following this issue, we finally are able to import graphframes
: https://github.com/graphframes/graphframes/issues/172
import sys
pyfiles = str(sc.getConf().get(u'spark.submit.pyFiles')).split(',')
sys.path.extend(pyfiles)
from graphframes import *
回答3:
The simplest way is to start jupyter with pyspark and graphframes is to start jupyter out from pyspark.
Just open your terminal and set the two environment variables and start pyspark
with the graphframes package
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
pyspark --packages graphframes:graphframes:0.6.0-spark2.3-s_2.11
the advantage of this is also that if you later on want to run your code via spark-submit
you can use the same start command
来源:https://stackoverflow.com/questions/50286139/no-module-named-graphframes-jupyter-notebook