I have a data frame that has a format like the following:
Month Frequency
2007-08 2
2010-11 5
2011-01 43
2011-02 52
2011-03 31
2011
Yea, rep
solutions will waste too much memory in most interesting/large cases. The HistogramTools CRAN package includes an efficient PreBinnedHistogram
function which creates a base R histogram object directly from a list of bins and breaks as the original question provided.
To get this kind of flexibility, you may have to replicate your data. Here is one way of doing it with rep
:
n <- 10
dat <- data.frame(
x = sort(sample(1:50, n)),
f = sample(1:100, n))
dat
expdat <- dat[rep(1:n, times=dat$f), "x", drop=FALSE]
Now you have your data replicated in the data.frame expdat
, allowing you to call hist
with different numbers of bins:
par(mfcol=c(1, 2))
hist(expdat$x, breaks=50, col="blue", main="50 bins")
hist(expdat$x, breaks=5, col="blue", main="5 bins")
par(mfcol=c(1, 1))
take a gander at ggplot2.
if you data is in a data.frame
called df
:
ggplot(df,aes(x=Month,y=Frequency))+geom_bar(stat='identity')
or if you want continuous time:
df$Month<-as.POSIXct(paste(df$Month, '01', sep='-'),format='%Y-%m-%d')
ggplot(df,aes(x=Month,y=Frequency))+geom_bar(stat='identity')
Another possibility is to scale down your frequency variable by some large factor so that rep doesn't have as much work to do. Then adjust the vertical axis scale of the histogram by that same factor.