Getting same output as cut() using speedier hist() or findInterval()?

前端 未结 2 1951
执笔经年
执笔经年 2021-01-27 00:11

I read this article http://www.r-bloggers.com/comparing-hist-and-cut-r-functions/ and tested hist() to be faster than cut() by ~4 times on my PC. My sc

2条回答
  •  失恋的感觉
    2021-01-27 01:07

    The hist function creates counts by bins in a similar way to a combination of table and cut. For example,

    set.seed(1)
    x <- rnorm(100)
    
    hist(x, plot = FALSE)
    ## $breaks
    ##  [1] -2.5 -2.0 -1.5 -1.0 -0.5  0.0  0.5  1.0  1.5  2.0  2.5
    ## 
    ## $counts
    ##  [1]  1  3  7 14 21 20 19  9  4  2
    
    table(cut(x, seq.int(-2.5, 2.5, 0.5)))
    ## (-2.5,-2] (-2,-1.5] (-1.5,-1] (-1,-0.5]  (-0.5,0]   (0,0.5]   (0.5,1]
    ##         1         3         7        14        21        20        19
    ##   (1,1.5]   (1.5,2]   (2,2.5] 
    ##         9         4         2
    

    If you want the raw output from cut, you can't use hist.

    However, if the speed of cut is a problem (and you might want to double check that it really is the slow part of your analysis; see premature optimization is the root of all evil), then you can use the lower level .bincode. This ignores the input checking and label-creating functions of cut.

    .bincode(x, seq.int(-2.5, 2.5, 0.5))
    ## [1]  4  6  4  9  6  4  6  7  7  5  9  6 ...
    

提交回复
热议问题