How to suppress parquet log messages in Spark?

后端未结

关注

 6  817

How to stop such messages from coming on my spark-shell console.

5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading n


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  挽巷        
                
              
                            
                2021-01-04 00:46
              
            
            
                                                                       
The solution from SPARK-8118 issue comment seem to work:


  You can disable the chatty output by creating a properties file with these contents:


org.apache.parquet.handlers=java.util.logging.ConsoleHandler
java.util.logging.ConsoleHandler.level=SEVERE



  And then passing the path of the file to Spark when the application is
  submitted. Assuming the file lives in /tmp/parquet.logging.properties
  (of course, that needs to be available on all worker nodes):


spark-submit \
     --conf spark.driver.extraJavaOptions="-Djava.util.logging.config.file=/tmp/parquet.logging.properties" \`
      --conf spark.executor.extraJavaOptions="-Djava.util.logging.config.file=/tmp/parquet.logging.properties" \
      ... 


Credits go to Justin Bailey.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  借酒劲吻你        
                
              
                            
                2021-01-04 00:47
              
            
            
                                                                       
not a solution but if you build your own spark then this file: https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileReader.java
has most the generations of log messages which you can comment out for now.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  借酒劲吻你        
                
              
                            
                2021-01-04 00:48
              
            
            
                                                                       
I believe this regressed --there are some large merges/changes they are making to the parquet integration...https://issues.apache.org/jira/browse/SPARK-4412
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南方客        
                
              
                            
                2021-01-04 00:48
              
            
            
                                                                       
To turn off all the messages except ERROR, you shoud edit your conf/log4j.properties file changing the following line:

log4j.rootCategory=INFO, console


into

log4j.rootCategory=ERROR, console


Hope it could help!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2021-01-04 00:56
              
            
            
                                                                       
I know this question was WRT Spark, but I recently had this issue when using Parquet with Hive in CDH 5.x and found a work-around. Details are here: https://issues.apache.org/jira/browse/SPARK-4412?focusedCommentId=16118403

Contents of my comment from that JIRA ticket below:


  This is also an issue in the version of parquet distributed in CDH
  5.x. In this case, I am using parquet-1.5.0-cdh5.8.4 (sources available here: http://archive.cloudera.com/cdh5/cdh/5)
  
  However, I've found a work-around for mapreduce jobs submitted via
  Hive. I'm sure this can be adapted for use with Spark as well.
  
  
  Add the following properties to your job's configuration (in my case, I added them to hive-site.xml since adding them to
  mapred-site.xml didn't work:
  
  
  

<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>
<property>
  <name>mapreduce.reduce.java.opts</name>
  <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>
<property>
  <name>mapreduce.child.java.opts</name>
  <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>

  
  
  Create a file named parquet-logging.properties with the following contents:
  
  
  

# Note: I'm certain not every line here is necessary. I just added them to cover all possible
# class/facility names.you will want to tailor this as per your needs.
.level=WARNING
java.util.logging.ConsoleHandler.level=WARNING

parquet.handlers=java.util.logging.ConsoleHandler
parquet.hadoop.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.hadoop.handlers=java.util.logging.ConsoleHandler

parquet.level=WARNING
parquet.hadoop.level=WARNING
org.apache.parquet.level=WARNING
org.apache.parquet.hadoop.level=WARNING

  
  
  Add the file to the job. In Hive, this is most easily done like so:

ADD FILE /path/to/parquet-logging.properties;
  
  
  With this done, when you run your Hive queries, parquet should only
  log WARNING (and higher) level messages to the stdout container logs.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不要未来只要你来        
                
              
                            
                2021-01-04 01:04
              
            
            
                                                                       
This will work for Spark 2.0. Edit file spark/log4j.properties and add:

log4j.logger.org.apache.spark.sql.execution.datasources.parquet=ERROR
log4j.logger.org.apache.spark.sql.execution.datasources.FileScanRDD=ERROR
log4j.logger.org.apache.hadoop.io.compress.CodecPool=ERROR


The lines for FileScanRDD and CodecPool will help with a couple of logs that are very verbose as well.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复