问题
I'd like to plot a weighted CDF using ggplot
. Some old non-SO discussions (e.g. this from 2012) suggest this is not possible, but thought I'd reraise.
For example, consider this data:
df <- data.frame(x=sort(runif(100)), w=1:100)
I can show an unweighted CDF with
ggplot(df, aes(x)) + stat_ecdf()
How would I weight this by w
? For this example, I'd expect an x^2
-looking function, since the larger numbers have higher weight.
回答1:
You can calculate the cumulative distribution within the data frame itself, i.e.:
df <- df[order(df$x), ] # Won't change anything since it was created sorted
df$cum.pct <- with(df, cumsum(x * w) / sum(x * w))
ggplot(df, aes(x, cum.pct)) + geom_line()
回答2:
There is a mistake in your answer.
This is the right code to compute the weighted ECDF:
df <- df[order(df$x), ] # Won't change anything since it was created sorted
df$cum.pct <- with(df, cumsum(w) / sum(w))
ggplot(df, aes(x, cum.pct)) + geom_line()
The ECDF is a function F(a)
equal to the sum of weights (probabilities) of observations where x<a
divided by the total sum of weights.
But here is a more satisfying option that simply modifies the original code of the ggplot2 stat_ecdf: https://github.com/NicolasWoloszko/stat_ecdf_weighted
来源:https://stackoverflow.com/questions/32487457/r-ggplot-weighted-cdf