I am having trouble plotting a histogram as a pdf (probability)
I want the sum of all the pieces to equal an area of one so it\'s easier to compare across datasets. For
The default number of breaks is around log2(N)
where N is 6 million in your case, so should be 22. If you're only seeing 4 breaks, that could be because you have xlim
in your call. This doesn't change the underlying histogram, it only affects which part of it is plotted. If you do
h <- hist(data[,1], freq=FALSE, breaks=800)
sum(h$density * diff(h$breaks))
you should get a result of 1.
The density of your data is related to its units of measurement; therefore you want to make sure that "no bin height should be above 1.0" is actually meaningful. For example, suppose we have a bunch of measurements in feet. We plot the histogram of the measurements as a density. We then convert all the measurements to inches (by multiplying by 12) and do another density-histogram. The height of the density will be 1/12th of the original even though the data is essentially the same. Similarly, you could make your bin heights all less than 1 by multiplying all your numbers by 15.
Does the value 1.0 have some kind of significance?