Getting frequency values from histogram in R

后端 未结 3 582
误落风尘
误落风尘 2020-12-13 00:18

I know how to draw histograms or other frequency/percentage related tables. But now I want to know, how can I get those frequency values in a table to use after the fact.

相关标签:
3条回答
  • 2020-12-13 00:39

    From ?hist: Value

    an object of class "histogram" which is a list with components:

    • breaks the n+1 cell boundaries (= breaks if that was a vector). These are the nominal breaks, not with the boundary fuzz.
    • counts n integers; for each cell, the number of x[] inside.
    • density values f^(x[i]), as estimated density values. If all(diff(breaks) == 1), they are the relative frequencies counts/n and in general satisfy sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] = breaks[i].
    • intensities same as density. Deprecated, but retained for compatibility.
    • mids the n cell midpoints.
    • xname a character string with the actual x argument name.
    • equidist logical, indicating if the distances between breaks are all the same.

    breaks and density provide just about all you need:

    histrv<-hist(x)
    histrv$breaks
    histrv$density
    
    0 讨论(0)
  • 2020-12-13 00:49

    Just in case someone hits this question with ggplot's geom_histogram in mind, note that there is a way to extract the data from a ggplot object.

    The following convenience function outputs a dataframe with the lower limit of each bin (xmin), the upper limit of each bin (xmax), the mid-point of each bin (x), as well as the frequency value (y).

    ## Convenience function
    get_hist <- function(p) {
        d <- ggplot_build(p)$data[[1]]
        data.frame(x = d$x, xmin = d$xmin, xmax = d$xmax, y = d$y)
    }
    
    # make a dataframe for ggplot
    set.seed(1)
    x = runif(100, 0, 10)
    y = cumsum(x)
    df <- data.frame(x = sort(x), y = y)
    
    # make geom_histogram 
    p <- ggplot(data = df, aes(x = x)) + 
        geom_histogram(aes(y = cumsum(..count..)), binwidth = 1, boundary = 0,
                    color = "black", fill = "white")
    

    Illustration:

    hist = get_hist(p)
    head(hist$x)
    ## [1] 0.5 1.5 2.5 3.5 4.5 5.5
    head(hist$y)
    ## [1]  7 13 24 38 52 57
    head(hist$xmax)
    ## [1] 1 2 3 4 5 6
    head(hist$xmin)
    ## [1] 0 1 2 3 4 5
    

    A related question I answered here (Cumulative histogram with ggplot2).

    0 讨论(0)
  • 2020-12-13 01:02

    The hist function has a return value (an object of class histogram):

    R> res <- hist(rnorm(100))
    R> res
    $breaks
    [1] -4 -3 -2 -1  0  1  2  3  4
    
    $counts
    [1]  1  2 17 27 34 16  2  1
    
    $intensities
    [1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01
    
    $density
    [1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01
    
    $mids
    [1] -3.5 -2.5 -1.5 -0.5  0.5  1.5  2.5  3.5
    
    $xname
    [1] "rnorm(100)"
    
    $equidist
    [1] TRUE
    
    attr(,"class")
    [1] "histogram"
    
    0 讨论(0)
提交回复
热议问题