Creating hive table using parquet file metadata

后端未结

关注

 6  1467

面向向阳花 2021-02-01 11:01

I wrote a DataFrame as parquet file. And, I would like to read the file using Hive using the metadata from parquet.

Output from writing parquet write

_co


      
      
        
          6条回答        

        
                    
            
            
                         
                
              
              
                
                   后悔当初
                                             
                
                
                (楼主)
            
              
              
                2021-02-01 11:33
              

            
            
                        
I had the same question. It might be hard to implement from pratcical side though, as Parquet supports schema evolution:

http://www.cloudera.com/content/www/en-us/documentation/archive/impala/2-x/2-0-x/topics/impala_parquet.html#parquet_schema_evolution_unique_1

For example, you could add a new column to your table and you don't have to touch data that's already in the table. It's only new datafiles will have new metadata (compatible with previous version).

Schema merging is switched off by default since Spark 1.5.0 since it is "relatively expensive operation"
http://spark.apache.org/docs/latest/sql-programming-guide.html#schema-merging
So infering most recent schema may not be as simple as it sounds. Although quick-and-dirty approaches are quite possible e.g. by parsing output from

$ parquet-tools schema /home/gz_files/result/000000_0

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它6个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复