Spark - convert Map to a single-row DataFrame

前端未结

关注

 3  1470

In my application I have a need to create a single-row DataFrame from a Map.

So that a Map like

(\"col1\" -> 5, \"col2\" -> 10, \"col3\" ->


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情话喂你        
                
              
                            
                2021-01-14 02:17
              
            
            
                                                                       
here you go :

val map: Map[String, Int] = Map("col1" -> 5, "col2" -> 6, "col3" -> 10)

val df = map.tail
  .foldLeft(Seq(map.head._2).toDF(map.head._1))((acc,curr) => acc.withColumn(curr._1,lit(curr._2)))


df.show()

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   5|   6|  10|
+----+----+----+

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南旧        
                
              
                            
                2021-01-14 02:17
              
            
            
                                                                       
A slight variation to Rapheal's answer. You can create a dummy column DF (1*1), then add the map elements using foldLeft and then finally delete the dummy column. That way, your foldLeft is straight forward and easy to remember.
val map: Map[String, Int] = Map("col1" -> 5, "col2" -> 6, "col3" -> 10)

val f = Seq("1").toDF("dummy")

map.keys.toList.sorted.foldLeft(f) { (acc,x) => acc.withColumn(x,lit(map(x)) ) }.drop("dummy").show(false)

+----+----+----+
|col1|col2|col3|
+----+----+----+
|5   |6   |10  |
+----+----+----+

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  逝去的感伤        
                
              
                            
                2021-01-14 02:26
              
            
            
                                                                       
I thought that sorting the column names doesn't hurt anyway.

  import org.apache.spark.sql.types._
  val map = Map("col1" -> 5, "col2" -> 6, "col3" -> 10)
  val (keys, values) = map.toList.sortBy(_._1).unzip
  val rows = spark.sparkContext.parallelize(Seq(Row(values: _*)))
  val schema = StructType(keys.map(
    k => StructField(k, IntegerType, nullable = false)))
  val df = spark.createDataFrame(rows, schema)
  df.show()


Gives:

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   5|   6|  10|
+----+----+----+


The idea is straightforward: convert map to list of tuples, unzip, convert the keys into a schema and the values into a single-entry row RDD, build dataframe from the two pieces (the interface for createDataFrame is a bit strange there, accepts java.util.Lists and kitchen sinks, but doesn't accept the usual scala List for some reason).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复