问题
I have a data set where x
represents day of year (say birthdays) and I want to create a density graph of this.
Further, since I have some grouping information (say boys or girls), I want to use the capabilities of ggplot2
to make a density plot.
Easy enough at first:
require(ggplot2); require(dplyr)
bdays <- data.frame(gender = sample(c('M', 'F'), 100, replace = T), bday = sample(1:365, 100, replace = T))
bdays %>% ggplot(aes(x = bday)) + geom_density(aes(color = factor(gender)))
However, this gives a poor estimate because of edge effects.
I want to apply the fact that I can use circular coordinates so that 365 + 1 = 1 -- one day after December 31st is January 1st.
I know that the circular
package provides this functionality, but I haven't had any success implementing it using a stat_function()
call.
It's particularly useful for me to use ggplot2
because I want to be able to use facets, aes
calls, etc.
Also, for clarification, I would like something that looks like geom_density
-- I am not looking for a polar plot like the one shown at: Circular density plot using ggplot2.
回答1:
To remove the edge effects you could stack three copies of the data, create the density estimate, and then show the density only for the middle copy of data. That will guarantee "wrap around" continuity of the density function from one edge to the other.
Below is an example comparing your original plot with the new version. I've used the adjust
parameter to set the same bandwidth between the two plots. Note also that in the circularized version, you'll need to renormalize the densities if you want them to add to 1:
set.seed(105)
bdays <- data.frame(gender = sample(c('M', 'F'), 100, replace = T), bday = sample(1:365, 100, replace = T))
# Stack three copies of the data, with adjusted values of bday
bdays = bind_rows(bdays, bdays, bdays)
bdays$bday = bdays$bday + rep(c(0,365,365*2),each=100)
# Function to adjust bandwidth of density plot
# Source: http://stackoverflow.com/a/24986121/496488
bw = function(b,x) b/bw.nrd0(x)
# New "circularized" version of plot
bdays %>% ggplot(aes(x = bday)) +
geom_density(aes(color = factor(gender)), adjust=bw(10, bdays$bday[1:100])) +
coord_cartesian(xlim=c(365, 365+365+1), expand=0) +
scale_x_continuous(breaks=seq(366+89, 366+365, 90), labels=seq(366+89, 366+365, 90)-365) +
scale_y_continuous(limits=c(0,0.0016))
ggtitle("Circularized")
# Original plot
ggplot(bdays[1:100,], aes(x = bday)) +
geom_density(aes(color = factor(gender)), adjust=bw(30, bdays$bday[1:100])) +
scale_x_continuous(breaks=seq(90,360,90), expand=c(0,0)) +
ggtitle("Not Circularized")
来源:https://stackoverflow.com/questions/36266402/ggplot2-density-of-circular-data