Generate a Spark StructType / Schema from a case class

前端未结

关注

 4  1743

If I wanted to create a StructType (i.e. a DataFrame.schema) out of a case class, is there a way to do it without creating a Dat


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2020-11-29 21:57
              
            
            
                                                                       
I know this question is almost a year old but I came across it and thought others who do also might want to know that I have just learned to use this approach:

import org.apache.spark.sql.Encoders
val mySchema = Encoders.product[MyCaseClass].schema

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦毁少年i        
                
              
                            
                2020-11-29 21:59
              
            
            
                                                                       
Instead of manually reproducing the logic for creating the implicit Encoder object that gets passed to toDF, one can use that directly (or, more precisely, implicitly in the same way as toDF):



// spark: SparkSession

import spark.implicits._

implicitly[Encoder[MyCaseClass]].schema


Unfortunately, this actually suffers from the same problem as using org.apache.spark.sql.catalyst or Encoders as in the other answers: the Encoder trait is experimental.

How does this work? The toDF method on Seq comes from a DatasetHolder, which is created via the implicit localSeqToDatasetHolder that is imported via spark.implicits._. That function is defined like:

implicit def localSeqToDatasetHolder[T](s: Seq[T])(implicit arg0: Encoder[T]): DatasetHolder[T]


As you can see, it takes an implicit Encoder[T] argument, which, for a case class, can be computed via newProductEncoder (also imported via spark.implicits._). We can reproduce this implicit logic to get an Encoder for our case class, via the convenience scala.Predef.implicitly (in scope by default, because it's from Predef) that will just returns its requested implicit argument:

def implicitly[T](implicit e: T): T

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2020-11-29 22:04
              
            
            
                                                                       
in case someone wants to do this for a custom Java bean:

ExpressionEncoder.javaBean(Event.class).schema().json()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  深忆病人        
                
              
                            
                2020-11-29 22:18
              
            
            
                                                                       
You can do it the same way SQLContext.createDataFrame does it:

import org.apache.spark.sql.catalyst.ScalaReflection
val schema = ScalaReflection.schemaFor[TestCase].dataType.asInstanceOf[StructType]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复