Sample() in R returning non-random sample after population vector length > 13. Why? [duplicate]

拈花ヽ惹草 提交于 2019-12-29 09:26:13

问题


The following code will return a perfectly sound sample:

b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12), 100000, replace=TRUE)
hist(b)

Increasing the number for elements by 1 to 14 will result into this:

b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13), 100000, replace=TRUE)
hist(b)

That's clearly not correct. Zero occurs more often that it should. Is there a reason for this?


回答1:


The problem lies in hist, not in sample.

You can check that doing:

> table(sample(0:15, 10000, replace=T))

  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15 
634 642 664 654 628 598 633 642 647 625 587 577 618 645 615 591 

From the hist help:

If right = TRUE (default), the histogram cells are intervals of the form (a, b], i.e., they include their right-hand endpoint, but not their left one, with the exception of the first cell when include.lowest is TRUE.

For right = FALSE, the intervals are of the form [a, b), and include.lowest means ‘include highest’.

If you try

hist(sample(0:15, 10000, replace=T), br=-1:15)

the results will look correct



来源:https://stackoverflow.com/questions/28006210/sample-in-r-returning-non-random-sample-after-population-vector-length-13-w

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!