Spark Shell - __spark_libs__.zip does not exist

前端 未结 3 487
感动是毒
感动是毒 2020-12-16 23:00

I\'m new to Spark and I\'m busy setting up a Spark Cluster with HA enabled.

When starting a spark shell for testing via: bash spark-shell --master yarn --deploy

相关标签:
3条回答
  • 2020-12-16 23:43

    I do not see any errors in your logs, there are only warnings that you can avoid by adding the environment variables :

    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
    

    For the exception : try to set manually the spark configuration for yarn : http://badrit.com/blog/2015/2/29/running-spark-on-yarn#.WD_e66IrJsM

    hdfs dfs -mkdir -p  /user/spark/share/lib<br>
    hdfs dfs -put $SPARK_HOME/assembly/lib/spark-assembly_*.jar        /user/spark/share/lib/spark-assembly.jar<br>
    export SPARK_JAR=hdfs://your-server:port/user/spark/share/lib/spark-assembly.jar
    

    Hope this help.

    0 讨论(0)
  • 2020-12-16 23:56

    You must have set your configuration to master("local[*]") from spark session.I have removed and it worked.

    0 讨论(0)
  • 2020-12-16 23:58

    This error was due to the config in the core-site.xml file.

    Please note that to find this file your HADOOP_CONF_DIR env variable must be set.

    In my case I added HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop/ to ./conf/spark-env.sh

    See: Spark Job running on Yarn Cluster java.io.FileNotFoundException: File does not exits , eventhough the file exits on the master node

    core-site.xml

    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://master:9000</value>
        </property> 
    </configuration>
    

    If this endpoint is unreachable, or if Spark detects that the file system is the same as the current system, the lib files will not be distributed to the other nodes in your cluster causing the errors above.

    In my situation the node I was on couldn't reach port 9000 on the specified host.

    Debugging

    Turn the log level up to info. You can do this by:

    1. Copy ./conf/log4j.properties.template to ./conf/log4j.properties

    2. In the file set log4j.logger.org.apache.spark.repl.Main = INFO

    Start your Spark Shell as normal. If your issue is the same as mine, you should see an info message such as: INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-c1a6cdcd-d348-4253-8755-5086a8931e75/__spark_libs__1391186608525933727.zip

    This should lead you to the problem as it starts the train reaction that results from the missing files.

    0 讨论(0)
提交回复
热议问题