I\'m running Spark on EMR as described in Run Spark and Spark SQL on Amazon Elastic MapReduce:
This tutorial walks you through installing and operating Sp
You can also just add the configuration option at cluster creation, if you know you want to suppress logging for a new EMR cluster.
EMR accepts configuration options as JSON, which you can enter directly into the AWS console, or pass in via a file when using the CLI.
In this case, in order to change the log level to WARN
, here's the JSON:
[
{
"classification": "spark-log4j",
"properties": {"log4j.rootCategory": "WARN, console"}
}
]
In the console, you'd add this in the first creation step:
Or if you're creating the cluster using the CLI:
aws emr create-cluster <options here> --configurations config_file.json
You can read more in the EMR documentation.
I was able to do this by editing $HOME/spark/conf/log4j.properties
as desired, and calling spark-sql
with --driver-java-options
as follows:
./spark/bin/spark-sql --driver-java-options "-Dlog4j.configuration=file:///home/hadoop/spark/conf/log4j.properties"
I could verify that the correct file was being used by adding -Dlog4j.debug
to the options:
./spark/bin/spark-sql --driver-java-options "-Dlog4j.debug -Dlog4j.configuration=file:///home/hadoop/spark/conf/log4j.properties"
spark-sql --driver-java-options "-Dlog4j.configuration=file:///home/hadoop/conf/log4j.properties"
cat conf/log4j.properties
# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=WARN
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=WARN