spark-submit, how to specify log4j.properties

前端 未结 6 1929
旧巷少年郎
旧巷少年郎 2021-02-02 00:04

In spark-submit, how to specify log4j.properties ?

Here is my script. I have tried all of combinations and even just use one local node. but looks like the log4j.propert

相关标签:
6条回答
  • 2021-02-02 00:13

    Just to add, you can directly pass the conf via spark-submit, no need to modify defaults conf file

    --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///export/home/siva/log4j.properties

    i ran below command, it worked fine

    /usr/hdp/latest/spark2/bin/spark-submit --master local[*] --files ~/log4j.properties --conf spark.sql.catalogImplementation=hive --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///export/home/siva/log4j.properties ~/SCD/spark-scd-assembly-1.0.jar test_run

    Note: If you have extra java options configured in conf file, just append and submit

    0 讨论(0)
  • 2021-02-02 00:20

    If this is just for a self-learning project or small development project, There is already a log4j.properties in hadoop_home/conf. Just edit that one, add your own loggers

    0 讨论(0)
  • 2021-02-02 00:31
    1. Copy the spark-defaults.conf to a new app-spark-defaults.conf
    2. Add -Dlog4j.configuration=file://log4j.properties to the spark.driver.extraJavaOptions in the app-spark-defaults.conf. For example:

      spark.driver.extraJavaOptions -XXOther_flag -Dlog4j.configuration=file://log4j.properties

    3. Run your spark using --properties-file to the new conf file. For example :
      spark-submit --properties-file app-spark-defaults.conf --class my.app.class --master yarn --deploy-mode client ~/my-jar.jar

    0 讨论(0)
  • 2021-02-02 00:34

    Pay attention the Spark worker is not your Java application, so you can't use a log4j.properties file from the class-path.

    To understand how Spark on YARN will read a log4j.properties file, you can use the log4j.debug=true flag:

    spark.executor.extraJavaOptions=-Dlog4j.debug=true
    

    Most of the time, the error is that the file is not found/available from the worker YARN container. There is a very useful Spark directive that allows to share file: --files.

    --files "./log4j.properties"
    

    This will make this file available from all your driver/workers. Add Java extra options:

    -Dlog4j.configuration=log4j.properties
    

    Et voilà!

    log4j: Using URL [file:/var/log/ambari-server/hadoop/yarn/local/usercache/hdfs/appcache/application_1524817715596_3370/container_e52_1524817715596_3370_01_000002/log4j.properties] for automatic log4j configuration.
    
    0 讨论(0)
  • 2021-02-02 00:36

    How to pass local log4j.properties file

    As I see from your script you want to:

    1. Pass local log4j.properties to executors
    2. Use this file for node's configuration.

    Note two things about --files settings:

    1. Files uploaded to spark-cluster with --files will be available at root dir, so there is no need to add any path in file:log4j.properties.
    2. Files listed in --files must be provided with absolute path!

    Fixing your snippet is very easy now:

    current_dir=/tmp
    log4j_setting="-Dlog4j.configuration=file:log4j.properties"
    
    spark-submit \
    ...
    --conf "spark.driver.extraJavaOptions=${log4j_setting}" \
    --conf "spark.executor.extraJavaOptions=${log4j_setting}" \
    --class "my.AppMain" \
    --files ${current_dir}/log4j.properties \
    ...
    $current_dir/my-app-SNAPSHOT-assembly.jar
    

    Need more?

    If you would like to read about other ways of configuring logging while using spark-submit, please visit my other detailed answer: https://stackoverflow.com/a/55596389/1549135

    0 讨论(0)
  • 2021-02-02 00:38

    Solution for spark-on-yarn

    for me, run spark on yarn,just add --files log4j.properties makes everything ok.
    1. make sure the directory where you run spark-submit contains file "log4j.properties".
    2. run spark-submit ... --files log4j.properties

    let's see why this work

    1.spark-submit will upload log4j.properties to hdfs like this

    20/03/31 01:22:51 INFO Client: Uploading resource file:/home/ssd/homework/shaofengfeng/tmp/firesparkl-1.0/log4j.properties -> hdfs://sandbox/user/homework/.sparkStaging/application_1580522585397_2668/log4j.properties
    

    2.when yarn launches containers for driver or executor,yarn will download all files uploaded into node's local file cache, including files under ${spark_home}/jars,${spark_home}/conf and ${hadoop_conf_dir} and files specified by --jars and --files.
    3.before launcher container, yarn export classpath and make soft links like this

    export CLASSPATH="$PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*
    
    ln -sf "/var/hadoop/yarn/local/usercache/homework/filecache/1484419/log4j.properties" "log4j.properties"
    hadoop_shell_errorcode=$?
    if [ $hadoop_shell_errorcode -ne 0 ]
    then
      exit $hadoop_shell_errorcode
    fi
    ln -sf "/var/hadoop/yarn/local/usercache/homework/filecache/1484440/apache-log4j-extras-1.2.17.jar" "apache-log4j-extras-1.2.17.jar"
    

    4.after step3, "log4.properties" is already in CLASSPATH, no need for setting spark.driver.extraJavaOptions or spark.executor.extraJavaOption.

    0 讨论(0)
提交回复
热议问题