问题
I'm playing around with R. I try to visualize the distribution of 1000 dice throws with the following R script:
cases <- 1000
min <- 1
max <- 6
x <- as.integer(runif(cases,min,max+1))
mx <- mean(x)
sd <- sd(x)
hist(
x,
xlim=c(min - abs(mx/2),max + abs(mx/2)),
main=paste(cases,"Samples"),
freq = FALSE,
breaks=seq(min,max,1)
)
curve(dnorm(x, mx, sd), add = TRUE, col="blue", lwd = 2)
abline(v = mx, col = "red", lwd = 2)
legend("bottomleft",
legend=c(paste('Mean (', mx, ')')),
col=c('red'), lwd=2, lty=c(1))
The script produces the following histogram:
Can someone explain to me why the first bar is so big? I've checked the data and it looks fine. How can I fix this?
Thank you in advance!
回答1:
Histograms aren't good for discrete data, they're designed for continuous data. Your data looks something like this:
> table(x)
x
1 2 3 4 5 6
174 138 162 178 196 152
i.e. roughly equal numbers of each value. But when you put that in a histogram, you chose breakpoints at 1:6. The first bar has 174 entries on its left limit, and 138 on its right limit, so it displays 312.
You could get a better looking histogram by specifying breaks at the half integers, i.e. breaks = 0:6 + 0.5
, but it still doesn't make sense to be using a histogram for data like this. Simply running plot(table(x))
or barplot(table(x))
gives a more accurate depiction of the data.
回答2:
You have incorrect breaks
and because of this, the first bar is counting 1 and 2's in the roll.
hist(
x,
xlim=c(0,6),
main=paste(cases,"Samples"),
freq = FALSE,
breaks=seq(0,6,1)
)
回答3:
m0nhawk gets to part of the problem. Another issue might be your use of as.integer
, which always rounds down (and therefore skews toward 1
).
as.integer(1.7)
# 1
round(1.7)
# 2
Lastly, I'm not sure why one would fit a gaussian to a uniform distribution. Generating the numbers from rnorm
, rather than runif
, would make more sense.
来源:https://stackoverflow.com/questions/43967838/why-is-the-first-bar-so-big-in-my-r-histogram