问题
How can I group a density plot and have the density of each group sum to one, when using weighted data?
The ggplot2
help for geom_density()
suggests a hack for using weighted data: dividing by the sum of the weights. But when grouped, this means that the combined density of the groups totals one. I would like the density of each group to total one.
I have found two clumsy ways to do this. The first is to treat each group as a separate dataset:
m <- ggplot()
m + geom_density(data = movies[movies$Action == 0, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="black") +
geom_density(data = movies[movies$Action == 1, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="blue")
Obvious disadvantages are the manual handling of factor levels and aesthetics. I also tried using the windowing functionality of the data.table
package to create a new column for the total votes per Action group, dividing by that instead:
movies.dt <- data.table(movies)
setkey(movies.dt, Action)
movies.dt[, votes.per.group := sum(votes), Action]
m <- ggplot(movies.dt, aes(x=rating, weight=votes/votes.per.group, group = Action, colour = Action))
m + geom_density(fill=NA)
Are there neater ways to do this? Because of the size of my tables, I'd rather not replicate rows by their weighting for the sake of using frequency.
回答1:
I think an auxillary table might be your only option. I had a similar problem here. The issue it seems is that, when ggplot
uses aggregating functions in aes(...)
, it applies them to the whole dataset, not the subsetted data. So when you write
aes(weight=votes/sum(votes))
the votes
in the numerator is subsetted based on Action
, but votes in the denominator, sum(votes)
, is not. The same is true for the implicit grouping with facets.
If someone else has a way around this I'd love to hear it.
来源:https://stackoverflow.com/questions/20342494/density-of-each-group-of-weighted-geom-density-sum-to-one