Using pyspark on Windows not working- py4j

半城伤御伤魂 提交于 2019-12-23 06:00:32

问题


I installed Zeppelin on Windows using this tutorial and this. I also installed java 8 to avoid problems.

I'm now able to start the Zeppelin server, and I'm trying to run this code -

%pyspark
a=5*4
print("value = %i" % (a))
sc.version

I'm getting this error, related to py4j. I had other problems with this library before (same as here), and to avoid them I replaced the library of py4j in the Zeppelin and Spark on my computer with the latest version- py4j 0.10.7.

This is the error I get-

Traceback (most recent call last):
  File "C:\Users\SHIRM~1.ARG\AppData\Local\Temp\zeppelin_pyspark-1240802621138907911.py", line 309, in <module>
    sc = _zsc_ = SparkContext(jsc=jsc, gateway=gateway, conf=conf)
  File "C:\Users\SHIRM.ARGUS\spark-2.3.2\spark-2.3.2-bin-hadoop2.7\python\pyspark\context.py", line 118, in __init__
    conf, jsc, profiler_cls)
  File "C:\Users\SHIRM.ARGUS\spark-2.3.2\spark-2.3.2-bin-hadoop2.7\python\pyspark\context.py", line 189, in _do_init
    self._javaAccumulator = self._jvm.PythonAccumulatorV2(host, port, auth_token)
  File "C:\Users\SHIRM.ARGUS\Documents\zeppelin-0.8.0-bin-all\interpreter\spark\pyspark\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1525, in __call__
  File "C:\Users\SHIRM.ARGUS\Documents\zeppelin-0.8.0-bin-all\interpreter\spark\pyspark\py4j-0.10.7-src.zip\py4j\protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.api.python.PythonAccumulatorV2. Trace:

I googled it, but couldn't find anyone that it had happened to.

Does anyone have an idea how can I solve this?

Thanks


回答1:


I feel you have installed Java 9 or 10. Uninstall either of those versions and install a fresh copy of Java 8 from here: https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

And set JAVA_HOME inside hadoop_env.cmd (open with any text-editor).

Note: Java 8 or 7 are stable versions to use and uninstall any existing versions of JAVA. Make sure you add JDK (not JRE) in JAVA_HOME.




回答2:


I faced the same problem today, and I fixed it by adding PYTHONPATH in the system environment like:
%SPARK_HOME%\python\lib\py4j;%SPARK_HOME%\python\lib\pyspark



来源:https://stackoverflow.com/questions/52646868/using-pyspark-on-windows-not-working-py4j

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!