I am trying to generate a random sample that excludes certain \"bad data.\" I do not know whether the data is \"bad\" until after I sample it. Thus, I need to make a rando
Here is a general use of a while
loop:
random.sample <- function(x) {
success <- FALSE
while (!success) {
# do something
i <- sample(nrow(df), 1)
x <- df[sample(nrow(df), 1), ]
# check for success
success <- x$SCORE > 0
}
return(x)
}
An alternative is to use repeat
(syntactic sugar for while(TRUE)
) and break
:
random.sample <- function(x) {
repeat {
# do something
i <- sample(nrow(df), 1)
x <- df[sample(nrow(df), 1), ]
# exit if the condition is met
if (x$SCORE > 0) break
}
return(x)
}
where break
makes you exit the repeat
block. Alternatively, you could have if (x$SCORE > 0) return(x)
to exit the function directly.
You can just select the rows to sample directly like so (just 5):
> df <- data.frame(NAME=c(rep('Frank',10),rep('Mary',10)), SCORE=rnorm(20))
> df[sample(which(df$SCORE>0), 5),]
NAME SCORE
14 Mary 1.0858854
10 Frank 0.7037989
16 Mary 0.7688913
5 Frank 0.2067499
17 Mary 0.4391216
this is without replacement, for bootstrap put in replace=T
.
random.sample <- function(x) {
x <- df[sample(nrow(df), 1), ]
if (x$SCORE > 0) return(x)
Recall(x)# run the function again
}
random.sample(df)
# NAME SCORE
#14 Mary 1.252566
It seems to me that this should work as well:
df$SCORE[ df$SCORE > 0 ][ sample(1:sum(df$SCORE > 0), 1) ]
#[1] 0.6579631
use this after your first sample
while (any(bad <- (x$SCORE <= 0)))
x[bad, ] <- df[sample(nrow(df), sum(bad)), ]