Problems to create DataFrame from Rows containing Option[T]

后端未结

关注

 2  630

I\'m migrating some code from Spark 1.6 to Spark 2.1 and struggling with the following issue:

This worked perfectly in Spark 1.6

import org.apache.spark.


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情话喂你        
                
              
                            
                2021-01-22 19:53
              
            
            
                                                                       
The error message is clear which says that Some is used when bigint is required

scala.Some is not a valid external type for schema of bigint


So you need to use Option combining with getOrElse so that we can define null when Option returns nullpointer. The following code should work for you 

val sc = ss.sparkContext
val sqlContext = ss.sqlContext
val schema = StructType(Seq(StructField("i", LongType,nullable=true)))
val rows = sc.parallelize(Seq(Row(Option(1L) getOrElse(null))))
sqlContext.createDataFrame(rows,schema).show


I hope this answer is helpful
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2021-01-22 20:04
              
            
            
                                                                       
There is actually an JIRA SPARK-19056 about this issue which is not actually one. 

So this behavior is intentional. 


  Allowing Option in Row is never documented and brings a lot of troubles when we apply the encoder framework to all typed operations. Since Spark 2.0, please use Dataset for typed operation/custom objects. e.g.


val ds = Seq(1 -> None, 2 -> Some("str")).toDS
ds.toDF // schema: <_1: int, _2: string>

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复