How to use purrr's map function to perform row-wise prop.tests and add results to the dataframe?

后端未结

关注

 2  722

I\'m trying to solve the following problem in R: I have a dataframe with two variables (number of successes, and number of total trials).

# A tibble: 4 x 2


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2021-01-03 11:57
              
            
            
                                                                       
We can use pmap after changing the column names with the arguments of 'prop.test'
pmap(setNames(df, c("x", "n")), prop.test)


Or using map2
map2(df$Success, df$N, prop.test)


The problem with map is that it is looping through each of the columns of the dataset and it is a list of vectors
df %>%
   map(~ .x)
#$Success
#[1] 38 12 27  9

#$N
#[1] 50 50 50 50

So, we cannot do .x$Success or .x$N
Update
As @Steven Beaupre mentioned, if we need to create new columns with p-value and confidence interval
res <- df %>%
        mutate(newcol = map2(Success, N, prop.test), 
            pval = map_dbl(newcol, ~ .x[["p.value"]]), 
            CI = map(newcol, ~ as.numeric(.x[["conf.int"]]))) %>% 
            select(-newcol) 
# A tibble: 4 x 4
#   Success     N      pval CI       
#    <dbl> <dbl>     <dbl> <list>   
#1   38.0   50.0 0.000407  <dbl [2]>  
#2   12.0   50.0 0.000407  <dbl [2]>
#3   27.0   50.0 0.671     <dbl [2]>
#4    9.00  50.0 0.0000116 <dbl [2]>

The 'CI' column is a list of 2 elements, which can be unnested to make it a 'long' format data
res %>%
   unnest


Or create 3 columns
df %>% 
  mutate(newcol = map2(Success, N,  ~ prop.test(.x, n = .y) %>% 
                  {tibble(pvalue = .[["p.value"]],
                         CI_lower = .[["conf.int"]][[1]], 
                         CI_upper = .[["conf.int"]][[2]])})) %>%
  unnest
# A tibble: 4 x 5
#  Success     N    pvalue CI_lower CI_upper
#    <dbl> <dbl>     <dbl>    <dbl>    <dbl>
#1   38.0   50.0 0.000407    0.615     0.865
#2   12.0   50.0 0.000407    0.135     0.385
#3   27.0   50.0 0.671       0.395     0.679
#4    9.00  50.0 0.0000116   0.0905    0.319

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2021-01-03 11:59
              
            
            
                                                                       
If you want a new column, you'd use @akrun's approach but sprinkle in a little dplyr and broom amongst the purrr

library(tidyverse) # for dplyr, purrr, tidyr & co.
library(broom)

analysis <- df %>%
  set_names(c("x","n")) %>% 
  mutate(result = pmap(., prop.test)) %>% 
  mutate(result = map(result, tidy)) 


From there that gives you the results in a tidy nested tibble.  If you want to just limit that to certain variables, you'd just follow the mutate/map applying functions to the nested frame, then unnest().

analysis %>% 
  mutate(result = map(result, ~select(.x, p.value, conf.low, conf.high))) %>% 
  unnest()

# A tibble: 4 x 5
      x     n   p.value conf.low conf.high
  <dbl> <dbl>     <dbl>    <dbl>     <dbl>
1 38.0   50.0 0.000407    0.615      0.865
2 12.0   50.0 0.000407    0.135      0.385
3 27.0   50.0 0.671       0.395      0.679
4  9.00  50.0 0.0000116   0.0905     0.319

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复