Altering distribution of one dataset to match another dataset

前端 未结 2 1428
野趣味
野趣味 2021-01-16 02:50

I have 2 datasets, one of modeled (artificial) data and another with observed data. They have slightly different statistical distributions and I want to force the modeled da

相关标签:
2条回答
  • 2021-01-16 03:34

    Have a look at this answer How to generate distributions given, mean, SD, skew and kurtosis in R?.

    It discusses use of the SuppDists package. This package permits you to create a distribution by creating a set of parameters based on the Johnson system of distributions.

    0 讨论(0)
  • 2021-01-16 03:55

    Are you just modeling the distribution of observed? If so, you could generate a kernel density estimate from the observations and then resample from that modeled density distribution. For example:

    library(ggplot2)
    

    First we generate a density estimate from the observed values. This is our model of the distribution of the observed values. adjust is a parameter that determines the bandwidth. The default value is 1. Smaller values result in less smoothing (i.e., a density estimate that more closely follows small-scale structure in the data):

    dens.obs = density(observed, adjust=0.8)
    

    Now, resample from the density estimate to get the modeled values. We set prob=dens.obs$y so that the probability of a value in dens.obs$x being chosen is proportional to its modeled density.

    set.seed(439)
    resample.obs = sample(dens.obs$x, 1000, replace=TRUE, prob=dens.obs$y)
    

    Put observed and modeled values in a data frame in preparation for plotting:

    dat = data.frame(value=c(observed,resample.obs), 
                     group=rep(c("Observed","Modeled"), c(length(observed),length(resample.obs))))
    

    The ECDF (empirical cumulative distribution function) plot below shows that sampling from the kernel density estimate gives samples with a distribution similar to the observed data:

    ggplot(dat, aes(value, fill=group, colour=group)) +
      stat_ecdf(geom="step") +
      theme_bw()
    

    You can also plot the density distribution of the observed data and the values sampled from the modeled distribution (using the same value for the adjust parameter as we used above).

    ggplot(dat, aes(value, fill=group, colour=group)) +
      geom_density(alpha=0.4, adjust=0.8) +
      theme_bw()
    

    0 讨论(0)
提交回复
热议问题