Merge two spark sql columns of type Array[string] into a new Array[string] column

前端未结

关注

 2  1674

旧巷少年郎 2021-01-05 15:01

I have two columns in a Spark SQL DataFrame with each entry in either column as an array of strings.

val  ngramDataFrame = Seq(
  (Seq(\"curiou


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   花落未央
                                             
                
                
                (楼主)
            
              
              
                2021-01-05 15:38
              

            
            
                        
In Spark 2.4 or later you can use concat (if you want to keep duplicates):

ngramDataFrame.withColumn(
  "full_array", concat($"filtered_words", $"ngrams_array")
).show


+--------------------+---------------+--------------------+
|      filtered_words|   ngrams_array|          full_array|
+--------------------+---------------+--------------------+
|[curious, bought,...|[iwa, was, asj]|[curious, bought,...|
+--------------------+---------------+--------------------+


or array_union (if you want to drop duplicates):

ngramDataFrame.withColumn(
  "full_array",
   array_union($"filtered_words", $"ngrams_array")
)


These can be also composed from the other higher order functions, for example

ngramDataFrame.withColumn(
   "full_array",
   flatten(array($"filtered_words", $"ngrams_array"))
)


with duplicates, and

ngramDataFrame.withColumn(
   "full_array",
   array_distinct(flatten(array($"filtered_words", $"ngrams_array")))
)


without.

On a side note, you shouldn't use WrappedArray when working with ArrayType columns. Instead you should expect the guaranteed interface, which is Seq. So the udf should use function with following signature:

(Seq[String], Seq[String]) => Seq[String]


Please refer to SQL Programming Guide for details.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复