Calculating an area under a continuous density plot

前端 未结 2 1586
旧巷少年郎
旧巷少年郎 2021-02-10 16:07

I have two density curves plotted using this:

Network <- Mydf$Networks
quartiles <-  quantile(Mydf$Avg.Position,  probs=c(25,50,75)/100)
density <- ggpl         


        
相关标签:
2条回答
  • 2021-02-10 16:28

    Calculate the density seperately and plot that one to start with. Then you can use basic arithmetics to get the estimate. An integration is approximated by adding together the area of a set of little squares. I use the mean method for that. the length is the difference between two x-values, the height is the mean of the y-value at the begin and at the end of the interval. I use the rollmeans function in the zoo package, but this can be done using the base package too.

    require(zoo)
    
    X <- rnorm(100)
    # calculate the density and check the plot
    Y <- density(X) # see ?density for parameters
    plot(Y$x,Y$y, type="l") #can use ggplot for this too
    # set an Avg.position value
    Avg.pos <- 1
    
    # construct lengths and heights
    xt <- diff(Y$x[Y$x<Avg.pos])
    yt <- rollmean(Y$y[Y$x<Avg.pos],2)
    # This gives you the area
    sum(xt*yt)
    

    This gives you a good approximation up to 3 digits behind the decimal sign. If you know the density function, take a look at ?integrate

    0 讨论(0)
  • 2021-02-10 16:39

    Three possibilities:

    The logspline package provides a different method of estimating density curves, but it does include pnorm style functions for the result.

    You could also approximate the area by feeding the x and y variables returned by the density function to the approxfun function and using the result with the integrate function. Unless you are interested in precise estimates of small tail areas (or very small intervals) then this will probably give a reasonable approximation.

    Density estimates are just sums of the kernels centered at the data, one such kernel is just the normal distribution. You could average the areas from pnorm (or other kernels) with the sd defined by the bandwidth and centered at your data.

    0 讨论(0)
提交回复
热议问题