My Datastax Spark doesn't work with my current python version and I have no idea why?

会有一股神秘感。 提交于 2019-12-23 04:53:28

问题


Below is my error message. When I use python 2.7 in Datastax Spark with the code below it doesn't work. I don't know why. Would be very grateful for some suggestions. Thanks

vi /etc/dse/spark/spark-env.sh
export PYTHONHOME=/usr/local
export PYTHONPATH=/usr/local/lib/python2.7
export PYSPARK_PYTHON=/usr/local/bin/python2.7

Error message:

Error from python worker:
  /usr/local/bin/python2.7: /usr/local/lib/python2.7/lib-dynload/_io.so: undefined symbol: _PyCodec_LookupTextEncoding
PYTHONPATH was:
  /usr/share/dse/spark/python/lib/pyspark.zip:/usr/share/dse/spark/python/lib/py4j-0.8.2.1-src.zip:/usr/share/dse/spark/lib/spark-core_2.10-1.4.2.2.jar:/usr/local/lib/python2.7
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)
        at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:130)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:73)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:315)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:70)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

来源:https://stackoverflow.com/questions/35716908/my-datastax-spark-doesnt-work-with-my-current-python-version-and-i-have-no-idea

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!