My goal is to compare the distribution of various socioeconomic factor such as income over multiple years to see how the population has evolved in particular region in say, over 5 years. The primary data for this comes from the Public Use Microdata Sample. I am using R
+ ggplot2
as my preferred tool.
When comparing two years worth of data (2005 and 2010) I have two data frames hh2005
and hh2010
with the household data for the two years. The income data for the two years are stored in the variable hincp
in both data frames. Using ggplot2
I am going about creating the density plot for individual years as follows (example for 2010):
p1 <- ggplot(data = hh2010, aes(x=hincp))+
geom_density()+
labs(title = "Distribution of income for 2010")+
labs(y="Density")+
labs(x="Household Income")
p1
How do I overlay the 2005 density over this plot? I am unable to figure it out as having read data
in as hh2010
I am not sure how to proceed. Should I be processing the data in a fundamentally different way from the very beginning?
You can pass data
arguments to individual geoms, so you should be able to add the second density as a new geom like this:
p1 <- ggplot(data = hh2010, aes(x=hincp))+
geom_density() +
# Change the fill colour to differentiate it
geom_density(data=hh2005, fill="purple") +
labs(title = "Distribution of income for 2010")+
labs(y="Density")+
labs(x="Household Income")
This is how I would approach the problem:
- Tag each data frame with the variable of interest (in this case, the year)
- Merge the two data sets
- Update the 'fill' aesthetic in the ggplot function
For example:
# tag each data frame with the year^
hh2005$year <- as.factor(2005)
hh2010$year <- as.factor(2010)
# merge the two data sets
d <- rbind(hh2005, hh2010)
d$year <- as.factor(d$year)
# update the aesthetic
p1 <- ggplot(data = d, aes(x=hincp, fill=year)) +
geom_density(alpha=.5) +
labs(title = "Distribution of income for 2005 and 2010") +
labs(y="Density") +
labs(x="Household Income")
p1
^ Note, the 'fill' parameter seems to work best when you use a factor, thus I defined the years as such. I also set the transparency of the overlapping density plots with the 'alpha' parameter.
来源:https://stackoverflow.com/questions/22600390/creating-density-plots-from-two-different-data-frames-using-ggplot2