How to use purrr's map function to perform row-wise prop.tests and add results to the dataframe?

后端 未结 2 722
囚心锁ツ
囚心锁ツ 2021-01-03 11:24

I\'m trying to solve the following problem in R: I have a dataframe with two variables (number of successes, and number of total trials).

# A tibble: 4 x 2
          


        
相关标签:
2条回答
  • 2021-01-03 11:57

    We can use pmap after changing the column names with the arguments of 'prop.test'

    pmap(setNames(df, c("x", "n")), prop.test)
    

    Or using map2

    map2(df$Success, df$N, prop.test)
    

    The problem with map is that it is looping through each of the columns of the dataset and it is a list of vectors

    df %>%
       map(~ .x)
    #$Success
    #[1] 38 12 27  9
    
    #$N
    #[1] 50 50 50 50
    

    So, we cannot do .x$Success or .x$N

    Update

    As @Steven Beaupre mentioned, if we need to create new columns with p-value and confidence interval

    res <- df %>%
            mutate(newcol = map2(Success, N, prop.test), 
                pval = map_dbl(newcol, ~ .x[["p.value"]]), 
                CI = map(newcol, ~ as.numeric(.x[["conf.int"]]))) %>% 
                select(-newcol) 
    # A tibble: 4 x 4
    #   Success     N      pval CI       
    #    <dbl> <dbl>     <dbl> <list>   
    #1   38.0   50.0 0.000407  <dbl [2]>  
    #2   12.0   50.0 0.000407  <dbl [2]>
    #3   27.0   50.0 0.671     <dbl [2]>
    #4    9.00  50.0 0.0000116 <dbl [2]>
    

    The 'CI' column is a list of 2 elements, which can be unnested to make it a 'long' format data

    res %>%
       unnest
    

    Or create 3 columns

    df %>% 
      mutate(newcol = map2(Success, N,  ~ prop.test(.x, n = .y) %>% 
                      {tibble(pvalue = .[["p.value"]],
                             CI_lower = .[["conf.int"]][[1]], 
                             CI_upper = .[["conf.int"]][[2]])})) %>%
      unnest
    # A tibble: 4 x 5
    #  Success     N    pvalue CI_lower CI_upper
    #    <dbl> <dbl>     <dbl>    <dbl>    <dbl>
    #1   38.0   50.0 0.000407    0.615     0.865
    #2   12.0   50.0 0.000407    0.135     0.385
    #3   27.0   50.0 0.671       0.395     0.679
    #4    9.00  50.0 0.0000116   0.0905    0.319
    
    0 讨论(0)
  • 2021-01-03 11:59

    If you want a new column, you'd use @akrun's approach but sprinkle in a little dplyr and broom amongst the purrr

    library(tidyverse) # for dplyr, purrr, tidyr & co.
    library(broom)
    
    analysis <- df %>%
      set_names(c("x","n")) %>% 
      mutate(result = pmap(., prop.test)) %>% 
      mutate(result = map(result, tidy)) 
    

    From there that gives you the results in a tidy nested tibble. If you want to just limit that to certain variables, you'd just follow the mutate/map applying functions to the nested frame, then unnest().

    analysis %>% 
      mutate(result = map(result, ~select(.x, p.value, conf.low, conf.high))) %>% 
      unnest()
    
    # A tibble: 4 x 5
          x     n   p.value conf.low conf.high
      <dbl> <dbl>     <dbl>    <dbl>     <dbl>
    1 38.0   50.0 0.000407    0.615      0.865
    2 12.0   50.0 0.000407    0.135      0.385
    3 27.0   50.0 0.671       0.395      0.679
    4  9.00  50.0 0.0000116   0.0905     0.319
    
    0 讨论(0)
提交回复
热议问题