dplyr::mutate to add multiple values

前端 未结 6 1574
旧时难觅i
旧时难觅i 2020-11-30 04:15

There are a couple of issues about this on the dplyr Github repo already, and at least one related SO question, but none of them quite covers my question -- I think.

相关标签:
6条回答
  • 2020-11-30 04:17

    Yet another option could be to use the purrr::map family of functions.

    If you replace rbind with dplyr::bind_rows in the get_binCI function:

    library(tidyverse)
    
    dd <- data.frame(x = c(3, 4), n = c(10, 11))
    get_binCI <- function(x, n) {
      bind_rows(setNames(c(binom.test(x, n)$conf.int), c("lwr", "upr")))
    }
    

    You can use purrr::map2 with tidyr::unnest:

    dd %>% mutate(result = map2(x, n, get_binCI)) %>% unnest()
    
    #>   x  n        lwr       upr
    #> 1 3 10 0.06673951 0.6524529
    #> 2 4 11 0.10926344 0.6920953
    

    Or purrr::map2_dfr with dplyr::bind_cols:

    dd %>% bind_cols(map2_dfr(.$x, .$n, get_binCI))
    
    #>   x  n        lwr       upr
    #> 1 3 10 0.06673951 0.6524529
    #> 2 4 11 0.10926344 0.6920953
    
    0 讨论(0)
  • 2020-11-30 04:26

    Here's a quick solution using data.table package instead

    First, a little change to the function

    get_binCI <- function(x,n) as.list(setNames(binom.test(x,n)$conf.int, c("lwr", "upr")))
    

    Then, simply

    library(data.table)
    setDT(dd)[, get_binCI(x, n), by = .(x, n)]
    #    x  n        lwr       upr
    # 1: 3 10 0.06673951 0.6524529
    # 2: 4 11 0.10926344 0.6920953
    
    0 讨论(0)
  • 2020-11-30 04:27

    Old question (with plenty of good answers), but this is a great use case for tidyverse's broom package, which deals with tidying output from test and modeling objects (such as binom.test, lm, etc).

    It's more verbose than other methods, but I think it matches your desire for a more expressive approach.

    The process is:

    1. Define the groups that you'll run binom.test on (in this case, those groups are defined by x and n) and nest them, creating separate data.frames for each (within the full data.frame)
    2. map the binom.test call to the x and n values from each group
    3. tidy the binom.test output for each group (this is where broom comes in)
    4. unnest the tidied test output data.frames into the full data.frame

    Now you're left with a data.frame where each row contains the x and n values, combined with all of the output from the corresponding binom.test, neatly formatted with separate columns for each bit of output information (point estimate, upper/lower conf, p-value, etc).

    library(tidyverse)
    library(broom)
    dd <- data.frame(x=c(3,4),n=c(10,11))
    dd %>%
      group_by(x, n) %>%
      nest() %>%
      mutate(test = map(data, ~tidy(binom.test(x, n)))) %>%
      unnest(test)
    #> # A tibble: 2 x 11
    #> # Groups:   x, n [2]
    #>       x     n data  estimate statistic p.value parameter conf.low conf.high
    #>   <dbl> <dbl> <lis>    <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl>
    #> 1     3    10 <tib…    0.3           3   0.344        10   0.0667     0.652
    #> 2     4    11 <tib…    0.364         4   0.549        11   0.109      0.692
    #> # … with 2 more variables: method <chr>, alternative <chr>
    

    From here you can get to your exact desired format with just a bit more manipulation, selecting the desired output variables, and renaming them:

    dd %>%
      group_by(x, n) %>%
      nest() %>%
      mutate(test = map(data, ~tidy(binom.test(x, n)))) %>%
      unnest(test) %>%
      rename(lwr = conf.low, upr = conf.high) %>%
      select(x, n, lwr, upr)
    #> # A tibble: 2 x 4
    #> # Groups:   x, n [2]
    #>       x     n    lwr   upr
    #>   <dbl> <dbl>  <dbl> <dbl>
    #> 1     3    10 0.0667 0.652
    #> 2     4    11 0.109  0.692
    

    As mentioned, it's verbose. Much more so than (for example) @joran's beautifully succinct

    dd %>% 
        group_by(x,n) %>%
        do(foo(.$x,.$n))
    

    However, the benefit of the broom approach is that you won't need to define a function foo (or get_binCI). It's fully self-contained, and in my opinion far more expressive and flexible.

    0 讨论(0)
  • 2020-11-30 04:29

    Here are some possibilities with rowwise and nesting.

    library("dplyr")
    library("tidyr")
    

    data frame with repeated x/n combinations, for fun

    dd <- data.frame(x=c(3, 4, 3), n=c(10, 11, 10))
    

    a versions of the CI function that returns a data frame, like @Joran's

    get_binCI_df <- function(x,n) {
      binom.test(x, n)$conf.int %>% 
        setNames(c("lwr", "upr")) %>% 
        as.list() %>% as.data.frame()
    }
    

    Grouping by x and n as before, removes the duplicate.

    dd %>% group_by(x,n) %>% do(get_binCI_df(.$x,.$n))
    # # A tibble: 2 x 4
    # # Groups:   x, n [2]
    #       x     n       lwr       upr
    #   <dbl> <dbl>     <dbl>     <dbl>
    # 1     3    10 0.1181172 0.8818828
    # 2     4    11 0.1092634 0.6920953
    

    Using rowwise keeps all the rows but removes x and n unless you put them back using cbind(. (like Ben does in his OP).

    dd %>% rowwise() %>% do(cbind(., get_binCI_df(.$x,.$n)))
    # Source: local data frame [3 x 4]
    # Groups: <by row>
    #   
    # # A tibble: 3 x 4
    #       x     n        lwr       upr
    # * <dbl> <dbl>      <dbl>     <dbl>
    # 1     3    10 0.06673951 0.6524529
    # 2     4    11 0.10926344 0.6920953
    # 3     3    10 0.06673951 0.6524529
    

    It feels like nesting could work more cleanly, but this is as good as I can get. Using mutate means I can use x and n directly instead of .$x and .$n, but mutate expects a single value, so it needs to be wrapped in list.

    dd %>% rowwise() %>% mutate(ci=list(get_binCI_df(x, n))) %>% unnest()
    # # A tibble: 3 x 4
    #       x     n        lwr       upr
    #   <dbl> <dbl>      <dbl>     <dbl>
    # 1     3    10 0.06673951 0.6524529
    # 2     4    11 0.10926344 0.6920953
    # 3     3    10 0.06673951 0.6524529
    

    Finally, looks like something like this is an open issue (as of 5 Oct 2017) for dplyr; see https://github.com/tidyverse/dplyr/issues/2326; if something like that is implemented then that will be the easiest way!

    0 讨论(0)
  • 2020-11-30 04:36

    Yet another variant, although I think we're all splitting hairs here.

    > dd <- data.frame(x=c(3,4),n=c(10,11))
    > get_binCI <- function(x,n) {
    +   as_data_frame(setNames(as.list(binom.test(x,n)$conf.int),c("lwr","upr")))
    + }
    > 
    > dd %>% 
    +   group_by(x,n) %>%
    +   do(get_binCI(.$x,.$n))
    Source: local data frame [2 x 4]
    Groups: x, n
    
      x  n        lwr       upr
    1 3 10 0.06673951 0.6524529
    2 4 11 0.10926344 0.6920953
    

    Personally, if we're just going by readability, I find this preferable:

    foo  <- function(x,n){
        bi <- binom.test(x,n)$conf.int
        data_frame(lwr = bi[1],
                   upr = bi[2])
    }
    
    dd %>% 
        group_by(x,n) %>%
        do(foo(.$x,.$n))
    

    ...but now we're really splitting hairs.

    0 讨论(0)
  • 2020-11-30 04:36

    This uses a "standard" dplyr workflow, but as @BenBolker notes in the comments, it requires calling get_binCI twice:

    dd %>% group_by(x,n) %>%
      mutate(lwr=get_binCI(x,n)[1],
             upr=get_binCI(x,n)[2])
    
      x  n        lwr       upr
    1 3 10 0.06673951 0.6524529
    2 4 11 0.10926344 0.6920953
    
    0 讨论(0)
提交回复
热议问题