How to suppress parquet log messages in Spark?

后端 未结 6 817
南方客
南方客 2021-01-04 00:40

How to stop such messages from coming on my spark-shell console.

5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading n         


        
相关标签:
6条回答
  • 2021-01-04 00:46

    The solution from SPARK-8118 issue comment seem to work:

    You can disable the chatty output by creating a properties file with these contents:

    org.apache.parquet.handlers=java.util.logging.ConsoleHandler
    java.util.logging.ConsoleHandler.level=SEVERE
    

    And then passing the path of the file to Spark when the application is submitted. Assuming the file lives in /tmp/parquet.logging.properties (of course, that needs to be available on all worker nodes):

    spark-submit \
         --conf spark.driver.extraJavaOptions="-Djava.util.logging.config.file=/tmp/parquet.logging.properties" \`
          --conf spark.executor.extraJavaOptions="-Djava.util.logging.config.file=/tmp/parquet.logging.properties" \
          ... 
    

    Credits go to Justin Bailey.

    0 讨论(0)
  • 2021-01-04 00:47

    not a solution but if you build your own spark then this file: https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileReader.java has most the generations of log messages which you can comment out for now.

    0 讨论(0)
  • 2021-01-04 00:48

    I believe this regressed --there are some large merges/changes they are making to the parquet integration...https://issues.apache.org/jira/browse/SPARK-4412

    0 讨论(0)
  • 2021-01-04 00:48

    To turn off all the messages except ERROR, you shoud edit your conf/log4j.properties file changing the following line:

    log4j.rootCategory=INFO, console
    

    into

    log4j.rootCategory=ERROR, console
    

    Hope it could help!

    0 讨论(0)
  • 2021-01-04 00:56

    I know this question was WRT Spark, but I recently had this issue when using Parquet with Hive in CDH 5.x and found a work-around. Details are here: https://issues.apache.org/jira/browse/SPARK-4412?focusedCommentId=16118403

    Contents of my comment from that JIRA ticket below:

    This is also an issue in the version of parquet distributed in CDH 5.x. In this case, I am using parquet-1.5.0-cdh5.8.4 (sources available here: http://archive.cloudera.com/cdh5/cdh/5)

    However, I've found a work-around for mapreduce jobs submitted via Hive. I'm sure this can be adapted for use with Spark as well.

    • Add the following properties to your job's configuration (in my case, I added them to hive-site.xml since adding them to mapred-site.xml didn't work:

    <property>
      <name>mapreduce.map.java.opts</name>
      <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
    </property>
    <property>
      <name>mapreduce.reduce.java.opts</name>
      <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
    </property>
    <property>
      <name>mapreduce.child.java.opts</name>
      <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
    </property>
    
    • Create a file named parquet-logging.properties with the following contents:

    # Note: I'm certain not every line here is necessary. I just added them to cover all possible
    # class/facility names.you will want to tailor this as per your needs.
    .level=WARNING
    java.util.logging.ConsoleHandler.level=WARNING
    
    parquet.handlers=java.util.logging.ConsoleHandler
    parquet.hadoop.handlers=java.util.logging.ConsoleHandler
    org.apache.parquet.handlers=java.util.logging.ConsoleHandler
    org.apache.parquet.hadoop.handlers=java.util.logging.ConsoleHandler
    
    parquet.level=WARNING
    parquet.hadoop.level=WARNING
    org.apache.parquet.level=WARNING
    org.apache.parquet.hadoop.level=WARNING
    
    • Add the file to the job. In Hive, this is most easily done like so:
      ADD FILE /path/to/parquet-logging.properties;

    With this done, when you run your Hive queries, parquet should only log WARNING (and higher) level messages to the stdout container logs.

    0 讨论(0)
  • This will work for Spark 2.0. Edit file spark/log4j.properties and add:

    log4j.logger.org.apache.spark.sql.execution.datasources.parquet=ERROR
    log4j.logger.org.apache.spark.sql.execution.datasources.FileScanRDD=ERROR
    log4j.logger.org.apache.hadoop.io.compress.CodecPool=ERROR
    

    The lines for FileScanRDD and CodecPool will help with a couple of logs that are very verbose as well.

    0 讨论(0)
提交回复
热议问题