Number of significant digits in dplyr summarise

后端 未结 3 1371
孤街浪徒
孤街浪徒 2021-01-08 00:24

I am having trouble getting the desired number of decimal places from summarise. Here is a simple example:

test2  <- data.frame(c(\"a\",\"a\",\"b\",\"b\")         


        
相关标签:
3条回答
  • 2021-01-08 00:37

    I think the simplest solution is the following:

    test2  <- data.frame(c("a","a","b","b"), c(245,246,247,248))
    library(dplyr)
    colnames(test2)  <- c("V1","V2")
    group_by(test2,V1) %>% summarise(`mean(V2)` = sprintf("%0.1f",mean(V2)))
    # A tibble: 2 x 2
      V1    `mean(V2)`
      <fct> <chr>     
    1 a     245.5     
    2 b     247.5     
    
    0 讨论(0)
  • 2021-01-08 00:49

    Because you are using dplyr tools, the resulting output is actually a tibble, which by default prints numbers with 3 significant digits (see option pillar.sigfig). This is not the same as number of digits after the period. To obtain the latter, convert it simply to a data.frame: as.data.frame

    Note that tibble's concept of significant digits is somehow complicated, and does not indicate how many digits after the period are represented, but the minimum number of digits necessary to have a given accurate representation of the number (I think 99.9%, see discussion here).

    This means the number of digits printed depends on the "size" of your number:

    library(tibble)
    packageVersion("tibble")
    #> [1] '2.1.3'
    packageVersion("pillar")
    #> [1] '1.4.2'
    tab <- tibble(x = c(0.1234, 1.1234, 10.1234, 100.1234, 1000.1234))
    
    options(pillar.sigfig=3)
    tab
    #> # A tibble: 5 x 1
    #>          x
    #>      <dbl>
    #> 1    0.123
    #> 2    1.12 
    #> 3   10.1  
    #> 4  100.   
    #> 5 1000.
    
    options(pillar.sigfig=4)
    tab
    #> # A tibble: 5 x 1
    #>           x
    #>       <dbl>
    #> 1    0.1234
    #> 2    1.123 
    #> 3   10.12  
    #> 4  100.1   
    #> 5 1000.
    
    as.data.frame(tab)
    #>           x
    #> 1    0.1234
    #> 2    1.1234
    #> 3   10.1234
    #> 4  100.1234
    #> 5 1000.1234
    

    Created on 2019-08-21 by the reprex package (v0.3.0)

    0 讨论(0)
  • 2021-01-08 00:54

    This is one solution-

    test2  <- data.frame(c("a", "a", "b", "b"), c(245, 246, 247, 248))
    library(dplyr)
    colnames(test2)  <- c("V1", "V2")
    group_by(test2, V1) %>% 
      dplyr::summarise(mean(V2)) %>% 
      dplyr::mutate_if(is.numeric, format, 1)
    #> # A tibble: 2 x 2
    #>   V1    `mean(V2)`
    #>   <fct> <chr>     
    #> 1 a     245.5     
    #> 2 b     247.5
    

    Created on 2018-01-20 by the reprex package (v0.1.1.9000).

    0 讨论(0)
提交回复
热议问题