Getting NullPointerException when running Spark Code in Zeppelin 0.7.1

后端 未结 9 1011
梦毁少年i
梦毁少年i 2021-02-05 10:59

I have installed Zeppelin 0.7.1. When I tried to execute the Example spark program(which was available with Zeppelin Tutorial notebook), I am getting t

相关标签:
9条回答
  • 2021-02-05 11:24

    I was getting the exactly same exception for zepelline 0.7.2 version on window 7. I had to do multiple changes into the configuration to make it work.

    First rename the zeppelin-env.cmd.template to zeppelin-env.cmd. Add the env variable for PYTHONPATH. The file can be located at %ZEPPELIN_HOME%/conf folder.

    set PYTHONPATH=%SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-0.10.4-src.zip;%SPARK_HOME%\python\lib\pyspark.zip
    

    Open the zeppelin.cmd from location %ZEPPELIN_HOME%/bin to add a %SPARK_HOME% and %ZEPPELIN_HOME%. Those will be the first lines in the instruction. The value for %SPARK_HOME% was configured as blank as I was using the embedded spark library.I added %ZEPPELIN_HOME% to make sure this env is configured at the initial stage of startup.

    set SPARK_HOME=
    set ZEPPELIN_HOME=<PATH to zeppelin installed folder>
    

    Next we will have to copy all the jar and pySpark from the %spark_home%/ to zeppeline folder.

    cp %SPARK_HOME%/jar/*.jar %ZEPPELIN_HOME%/interpreter/spark
    cp %SPARK_HOME%/python/pyspark %ZEPPELIN_HOME%/interpreter/spark/pyspark
    

    I wasn't starting the interpreter.cmd while accessing the notebook. This was causing the nullpointer exception. I opened two command prompt and in one cmd I started zeppeline.cmd and in the other interpreter.cmd.

    We have to specify two additional input port and path to zeppeline local_repo in command line. You can get the path to local_repo in zeppeline spark interpreter page. Use exactly same path to start the interpreter.cmd.

    interpreter.cmd  -d %ZEPPELIN_HOME%\interpreter\spark\ -p 5050  -l %ZEPPELIN_HOME%\local-repo\2D64VMYZE
    

    The host and port needs to be specified in the spark interpreter page in zepelline ui. Select the Connect to external Process

    HOST : localhost
    PORT : 5050
    

    Once all these on configuration are created, on next step we can save and restart the spark interpreter. Create a new notebook and type sc.version. It will publish the spark version. Zeppeline 0.7.2 doesn't support spark 2.2.1

    0 讨论(0)
  • 2021-02-05 11:25
        enterCaused by: java.net.ConnectException: Connection refused (Connection refused)
            at java.net.PlainSocketImpl.socketConnect(Native Method)
            at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
            at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
            at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
            at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
            at java.net.Socket.connect(Socket.java:589)
            at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
            ... 74 more
    )
            at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:466)
            at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:236)
            at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
            ... 71 more
     INFO [2017-11-20 17:51:55,288] ({pool-2-thread-4} SparkInterpreter.java[createSparkSession]:369) - Created Spark session with Hive support
    ERROR [2017-11-20 17:51:55,290] ({pool-2-thread-4} Job.java[run]:181) - Job failed code here
    

    It looks like Hive Metastore service not started. You can start the Metastore service and try again.

    hive --service metastore
    
    0 讨论(0)
  • 2021-02-05 11:26

    Finally, I am able to find out the reason. When I checked the logs in ZL_HOME/logs directory, find out it seems to be the Spark Driver binding error. Added the following property in Spark Interpreter Binding and works good now...

    PS : Looks like this issue comes up mainly if you connect to VPN...and I do connect to VPN

    0 讨论(0)
  • 2021-02-05 11:29

    Seems to be bug in Zeppelin 0.7.1. Works fine in 0.7.2.

    0 讨论(0)
  • 2021-02-05 11:33

    On AWS EMR the issue was memory. I had to manually set lower value for spark.executor.memory in the Interpeter for Spark using the UI of Zeppelin.

    The value varies based on your instance size. The best is to check the logs located in the /mnt/var/log/zeppelin/ folder.

    In my case the underlying error was:

    Error initializing SparkContext.
    java.lang.IllegalArgumentException: Required executor memory (6144+614 MB) is above the max threshold (6144 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.
    

    That helped me understand why it was failing and what I can do to fix it.

    Note:

    This happened because I was starting an instance with HBase which limits the available memory. See the defaults for instance size here.

    0 讨论(0)
  • 2021-02-05 11:34

    Check if your NameNode have gone in safe mode.

    check with below syntax:

    sudo -u hdfs hdfs dfsadmin -safemode get
    

    to leave from safe mode use below command:

    sudo -u hdfs hdfs dfsadmin -safemode leave
    
    0 讨论(0)
提交回复
热议问题