Getting NullPointerException when running Spark Code in Zeppelin 0.7.1

匿名 (未验证) 提交于 2019-12-03 02:51:02

问题:

I have installed Zeppelin 0.7.1. When I tried to execute the Example spark program(which was available with Zeppelin Tutorial notebook), I am getting the following error

java.lang.NullPointerException     at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)     at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)     at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:391)     at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:380)     at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)     at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:828)     at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)     at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:483)     at org.apache.zeppelin.scheduler.Job.run(Job.java:175)     at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)     at java.util.concurrent.FutureTask.run(FutureTask.java:266)     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)     at java.lang.Thread.run(Thread.java:745) 

I have also setup the config file(zeppelin-env.sh) to point to my Spark installation & Hadoop configuration directory

export SPARK_HOME="/${homedir}/sk" export HADOOP_CONF_DIR="/${homedir}/hp/etc/hadoop" 

The Spark version I am using is 2.1.0 & Hadoop is 2.7.3

Also I am using the default Spark Interpreter Configuration(so Spark is set to run in Local mode)

Am I missing something here?

PS : I am able to connect to spark from the Terminal using spark-shell

回答1:

Just now I got solution of this issue for Zeppelin-0.7.2:

Root Cause is : Spark trying to setup Hive context, but hdfs services is not running, that's why HiveContext become null and throwing null pointer exception.

Solution:
1. Setup Saprk Home [optional] and HDFS.
2. Run HDFS service
3. Restart zeppelin server
OR
1. Go to Zeppelin's Interpreter settings.
2. Select Spark Interpreter
3. zeppelin.spark.useHiveContext = false



回答2:

Finally, I am able to find out the reason. When I checked the logs in ZL_HOME/logs directory, find out it seems to be the Spark Driver binding error. Added the following property in Spark Interpreter Binding and works good now...

PS : Looks like this issue comes up mainly if you connect to VPN...and I do connect to VPN



回答3:

Did you set right SPARK_HOME? Just wondered what sk is in your export SPARK_HOME="/${homedir}/sk"

(I just wanted to comment below your question but couldn't, due to my lack of reputation?)



回答4:

solved it by adding this line at the top in file common.sh in dir zeppelin-0.6.1 then bin

open common.sh and add command in the top of file set :

unset CLASSPATH



回答5:

    enterCaused by: java.net.ConnectException: Connection refused (Connection refused)         at java.net.PlainSocketImpl.socketConnect(Native Method)         at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)         at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)         at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)         at java.net.Socket.connect(Socket.java:589)         at org.apache.thrift.transport.TSocket.open(TSocket.java:182)         ... 74 more )         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:466)         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:236)         at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)         ... 71 more  INFO [2017-11-20 17:51:55,288] ({pool-2-thread-4} SparkInterpreter.java[createSparkSession]:369) - Created Spark session with Hive support ERROR [2017-11-20 17:51:55,290] ({pool-2-thread-4} Job.java[run]:181) - Job failed code here 

It looks like Hive Metastore service not started. You can start the Metastore service and try again.

hive --service metastore 


回答6:

I was getting the exactly same exception for zepelline 0.7.2 version on window 7. I had to do multiple changes into the configuration to make it work.

First rename the zeppelin-env.cmd.template to zeppelin-env.cmd. Add the env variable for PYTHONPATH. The file can be located at %ZEPPELIN_HOME%/conf folder.

set PYTHONPATH=%SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-0.10.4-src.zip;%SPARK_HOME%\python\lib\pyspark.zip 

Open the zeppelin.cmd from location %ZEPPELIN_HOME%/bin to add a %SPARK_HOME% and %ZEPPELIN_HOME%. Those will be the first lines in the instruction. The value for %SPARK_HOME% was configured as blank as I was using the embedded spark library.I added %ZEPPELIN_HOME% to make sure this env is configured at the initial stage of startup.

set SPARK_HOME= set ZEPPELIN_HOME=<PATH to zeppelin installed folder> 

Next we will have to copy all the jar and pySpark from the %spark_home%/ to zeppeline folder.

cp %SPARK_HOME%/jar/*.jar %ZEPPELIN_HOME%/interpreter/spark cp %SPARK_HOME%/python/pyspark %ZEPPELIN_HOME%/interpreter/spark/pyspark 

I wasn't starting the interpreter.cmd while accessing the notebook. This was causing the nullpointer exception. I opened two command prompt and in one cmd I started zeppeline.cmd and in the other interpreter.cmd.

We have to specify two additional input port and path to zeppeline local_repo in command line. You can get the path to local_repo in zeppeline spark interpreter page. Use exactly same path to start the interpreter.cmd.

interpreter.cmd  -d %ZEPPELIN_HOME%\interpreter\spark\ -p 5050  -l %ZEPPELIN_HOME%\local-repo\2D64VMYZE 

The host and port needs to be specified in the spark interpreter page in zepelline ui. Select the Connect to external Process

HOST : localhost PORT : 5050 

Once all these on configuration are created, on next step we can save and restart the spark interpreter. Create a new notebook and type sc.version. It will publish the spark version. Zeppeline 0.7.2 doesn't support spark 2.2.1



回答7:

On AWS EMR the issue was memory. I had to manually set lower value for spark.executor.memory in the Interpeter for Spark using the UI of Zeppelin.

The value varies based on your instance size. The best is to check the logs located in the /mnt/var/log/zeppelin/ folder.

In my case the error was:

Error initializing SparkContext. java.lang.IllegalArgumentException: Required executor memory (6144+614 MB) is above the max threshold (6144 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'. 

That helped me understand why it was failing and what I can do to fix it.

Note:

This happened because I was starting an instance with HBase which limits the available memory. See the defaults for instance size here.



回答8:

Seems to be bug in Zeppelin 0.7.1. Works fine in 0.7.2.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!