how to use merge() to update a table in R

前端未结

关注

 6  685

I\'m trying to figure out how to use merge() to update a database.

Here is an example. Take for example the data frame foo


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-11-27 21:22
              
            
            
                                                                       
Another approach could be:


Remove the NAs from the first data fram 
Use rbind to append the data instead of using merge:


These are the original two data frames:

foo <- data.frame(index=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA))
bar <- data.frame(index=c('c', 'd'), value=c(200, 201))


(1) Use the negation of is.na to remove the NAs:

foo_new <- foo[!is.na(foo$value),]


(2) Bind the data frames and you'll get the answer you were looking for

new_df <- rbind(foo_new,bar)

            new_df
            index value
            1     a   100
            2     b   101
            3     c   200
            4     d   201

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南旧        
                
              
                            
                2020-11-27 21:33
              
            
            
                                                                       
Doesn't merge() always bind columns together? Does replace() work?

foo$value <- replace(foo$value, foo$index %in% bar$index, bar$value)


or match() so the order matters

foo$value[match(bar$index, foo$index)] <- bar$value

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘掉有多难        
                
              
                            
                2020-11-27 21:33
              
            
            
                                                                       
The optimal solution using data.table

library(data.table)
setDT(foo)
setDT(bar)
foo[bar, on="index", value:=i.value]
foo
#   index value
#1:     a   100
#2:     b   101
#3:     c   200
#4:     d   201


first argument in [ data.table method is named i thus we can refer to column from table in i argument using i. prefix.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  眼角桃花        
                
              
                            
                2020-11-27 21:36
              
            
            
                                                                       
I think the most simple way is to "mark" the value which need to be update prior to the merge. 

bar$update <- TRUE
foo <- merge(foo, bar, by='index', all=T, suffixes=c("",".update"))
foo[!is.na(foo$update),]$value <- foo[!is.na(foo$update),]$value.update
foo$value.update <- NULL
foo$update <- NULL


It would be faster using 'data.table'

library(data.table)
foo <- as.data.table(foo)
bar <- as.data.table(bar)
bar[, update:=TRUE]
foo <- merge(foo, bar, by='index', all=T, suffixes=c("",".update"))
foo[!is.na(update),value:=value.update]
foo[, c("value.update","update"):=NULL]
foo

   index value
1:     a   100
2:     b   101
3:     c   200
4:     d   201

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  独厮守ぢ        
                
              
                            
                2020-11-27 21:44
              
            
            
                                                                       
I would also like to present an sql-solution using library sqldf and the R integrated sqlite-database. I like the simplicity, accuratness and power of sql.

Accurateness: since I can exactly define which object=rows I want to change without considering the order of a data.frame (foo.id = bar.id).

Power: in WHERE after SET and WHERE (third row) I can define all conditions I want to consider to update.

Simplicity: the syntax is more readable than using index in vectors, matrix or dataframes.

library(sqldf)

# I changed index to id since index does not work. 
#   Obviously index is a key word in sqlite.

(foo <- data.frame(id=c('a', 'b', 'c', 'd'), value=c(100, 101, NA, NA)))
(bar <- data.frame(id=c('c', 'd'), value=c(200, 201)))

sqldf(c(paste("UPDATE foo"
             ," SET value = (SELECT bar.value FROM bar WHERE foo.id = bar.id)"
             ," WHERE value IS NULL"
             )
        , " SELECT * FROM main.foo"
    )
)


Which gives

  id value
1  a   100
2  b   101
3  c   200
4  d   201


Similar issues:

r equivalent of sql update?

R sqlite: update with two tables
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  余生分开走        
                
              
                            
                2020-11-27 21:45
              
            
            
                                                                       
merge() only merges in new data. For instance, if you had a data set of average income for a few cities, and a separate data set of the populations of those cities, you would use merge() to merge in one set of data into the other.

Like apeescape said, replace() is probably what you want.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复