问题
The following code will return a perfectly sound sample:
b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12), 100000, replace=TRUE)
hist(b)
Increasing the number for elements by 1 to 14 will result into this:
b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13), 100000, replace=TRUE)
hist(b)
That's clearly not correct. Zero occurs more often that it should. Is there a reason for this?
回答1:
The problem lies in hist
, not in sample
.
You can check that doing:
> table(sample(0:15, 10000, replace=T))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
634 642 664 654 628 598 633 642 647 625 587 577 618 645 615 591
From the hist
help:
If right = TRUE (default), the histogram cells are intervals of the form (a, b], i.e., they include their right-hand endpoint, but not their left one, with the exception of the first cell when include.lowest is TRUE.
For right = FALSE, the intervals are of the form [a, b), and include.lowest means ‘include highest’.
If you try
hist(sample(0:15, 10000, replace=T), br=-1:15)
the results will look correct
来源:https://stackoverflow.com/questions/28006210/sample-in-r-returning-non-random-sample-after-population-vector-length-13-w