问题
I am trying to sample between a range of values as part of a larger loop in R. As the loop progresses to each row j
, I want to sample a number between the value given in the start
column and the value given in the end
column, placing that value in the sampled
column for that row.
The results should look something like this:
ID start end sampled
a 25 67 44
b 36 97 67
c 23 85 77
d 15 67 52
e 21 52 41
f 43 72 66
g 39 55 49
h 27 62 35
i 11 99 17
j 21 89 66
k 28 65 48
l 44 58 48
m 16 77 22
n 25 88 65
I started using mapply
, which samples the whole df, but then I'm trying to fit all 15 sampled values into a single row.
df[j,4] <- mapply(function(x, y) sample(seq(x, y), 1), df$start, df$end)
I thought maybe something using seq
might work, but this results in errors saying that from
must be of length 1.
df[j,4] <- sample(seq(df$start, df$end),1,replace=TRUE)
The outer looping structure is pretty complicated so I haven't included it here, but the df[j,4]
part of the code is necessary because it is part of a larger loop. There are situations where rows have to be resampled based on additional dependencies in the actual dataset. For example, the sampled value of a
might need to be larger than b
. The rest of the code updates the sampled column, checks for dependencies, and will rerun the sample if the dependencies aren't met. If I can get this sampling section to work, I should be able to plug it in without too much trouble (I hope).
Here's a sample data set.
structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n"), start = c(25, 36, 23, 15, 21,
43, 39, 27, 11, 21, 28, 44, 16, 25), end = c(67, 97, 85, 67,
52, 72, 55, 62, 99, 89, 65, 58, 77, 88), sampled = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L), spec = structure(list(
cols = list(ID = structure(list(), class = c("collector_character",
"collector")), start = structure(list(), class = c("collector_double",
"collector")), end = structure(list(), class = c("collector_double",
"collector")), sampled = structure(list(), class = c("collector_logical",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))```
回答1:
First, put the data in a format that is easier to use with dput(df)
:
df <- structure(list(ID = structure(1:14, .Label = c("a", "b", "c",
"d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n"), class = "factor"),
start = c(25L, 36L, 23L, 15L, 21L, 43L, 39L, 27L, 11L, 21L,
28L, 44L, 16L, 25L), end = c(67L, 97L, 85L, 67L, 52L, 72L,
55L, 62L, 99L, 89L, 65L, 58L, 77L, 88L), sampled = c(44L,
67L, 77L, 52L, 41L, 66L, 49L, 35L, 17L, 66L, 48L, 48L, 22L,
65L)), class = "data.frame", row.names = c(NA, -14L))
You were very close with mapply()
but you made it harder than it needs to be:
df$sampled <- mapply(function(x, y) sample(seq(x, y), 1), df$start, df$end)
df
# ID start end sampled
# 1 a 25 67 67
# 2 b 36 97 86
# 3 c 23 85 54
# 4 d 15 67 36
# 5 e 21 52 37
# 6 f 43 72 60
# 7 g 39 55 44
# 8 h 27 62 37
# 9 i 11 99 86
# 10 j 21 89 52
# 11 k 28 65 65
# 12 l 44 58 51
# 13 m 16 77 62
# 14 n 25 88 31
回答2:
You might not need to loop through. If you want need is something between start and end, it's almost equivalent to sampling something between 0-1 and multiplying that by the range.
df %>% mutate(sampled = start + round((end-start)*runif(nrow(.))))
Regarding the updating, dependencies you mentioned in your comment: sounds a bit complicated. Quick thought: Might be faster to sample a lot of times and choose one that fits your criteria.
回答3:
Figured it out.
df[j,4] <- mapply(function(x, y) sample(seq(x, y), 1), df[j,"start"], df[j,"end"])
I just needed to be specific as to which row of the sampled values I wanted to enter into df[j,4]
. Specifying row j
for columns start
and end
did the trick.
来源:https://stackoverflow.com/questions/58653809/sample-using-start-and-end-values-within-a-loop-in-r