Add multiple output variables using purrr and a predefined function

前端未结

关注

 3  813

Take this simple dataset and function (representative of more complex problems):

x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) (a + b) * n


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2021-01-05 06:44
              
            
            
                                                                       
To mimic the input format for Map, we could call pmap from purrr in this way:

x[paste0("new",seq_along(ns))] <- pmap(list(x['a'], x['b'], ns), mult)


To fit this in a pipe:

x %>%
    {list(.['a'], .['b'], ns)} %>%
    pmap(mult) %>%
    setNames(paste0('new', seq_along(ns))) %>%
    cbind(x)

#   new1 new2 a b
# 1    3    6 1 2
# 2    5   10 2 3
# 3    7   14 3 4


Apparently, this looks ugly compared to the concise base R code. But I could not think of a better way.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  面向向阳花        
                
              
                            
                2021-01-05 06:54
              
            
            
                                                                       
Here is one possibility.

library(purrr)
library(dplyr)
n <- 1:2
x %>%
    mutate(val = pmap(., mult, n = n)) %>%
    unnest() %>%
    mutate(var = rep(paste0("new", n), nrow(.) / length(n))) %>%
    spread(var, val)
#  a b new1 new2
#1 1 2    3    6
#2 2 3    5   10
#3 3 4    7   14


Not pretty, so I'm also curious to see alternatives. A lot of excess comes about from unnesting the list column and spreading into new columns.

Here is another possibility using pmap_dfc plus an ugly as.data.frame(t(...)) call

bind_cols(x, as.data.frame(t(pmap_dfc(x, mult, n = n))))
#  a b V1 V2
#1 1 2  3  6
#2 2 3  5 10
#3 3 4  7 14




Sample data

x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) (a + b) * n

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2021-01-05 06:59
              
            
            
                                                                       
The best approach I've found (which is still not terribly elegant) is to pipe into bind_cols. To get pmap_dfr to work correctly, the function should return a named list (which may or may not be a data frame):



library(tidyverse)

x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) as.list(set_names((a + b) * n, paste0('new', n)))

x %>% bind_cols(pmap_dfr(., mult, n = 1:2))
#>   a b new1 new2
#> 1 1 2    3    6
#> 2 2 3    5   10
#> 3 3 4    7   14


To avoid changing the definition of mult, you can wrap it in an anonymous function:



mult <- function(a,b,n) (a + b) * n

x %>% bind_cols(pmap_dfr(
    ., 
    ~as.list(set_names(
        mult(...), 
        paste0('new', 1:2)
    )), 
    n = 1:2
))
#>   a b new1 new2
#> 1 1 2    3    6
#> 2 2 3    5   10
#> 3 3 4    7   14


In this particular case, it's not actually necessary to iterate over rows, though, because you can vectorize the inputs from x and instead iterate over n. The advantage is that usually n > p, so the number of iterations will be [potentially much] lower. To be clear, whether such an approach is possible depends on for which parameters the function can accept vector arguments.

mult still needs to be called on the variables of x. The simplest way to do this is to pass them explicitly:

x %>% bind_cols(map_dfc(1:2, ~mult(x$a, x$b, .x)))
#>   a b V1 V2
#> 1 1 2  3  6
#> 2 2 3  5 10
#> 3 3 4  7 14


...but this loses the benefit of pmap that named variables will automatically get passed to the correct parameter. You can get that back by using purrr::lift, which is an adverb that changes the domain of a function so it accepts a list by wrapping it in do.call. The returned function can be called on x and the value of n for that iteration:

x %>% bind_cols(map_dfc(1:2, ~lift(mult)(x, n = .x)))


This is equivalent to

x %>% bind_cols(map_dfc(1:2, ~invoke(mult, x, n = .x)))


but the advantage of the former is that it returns a function which can be partially applied on x so it only has an n parameter left, and thus requires no explicit references to x and so pipes better:

x %>% bind_cols(map_dfc(1:2, partial(lift(mult), .)))


All return the same thing. Names can be fixed after the fact with %>% set_names(~sub('^V(\\d+)$', 'new\\1', .x)), if you like.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复