Sample using start and end values within a loop in R

让人想犯罪 __ 提交于 2021-02-05 07:53:36

问题


I am trying to sample between a range of values as part of a larger loop in R. As the loop progresses to each row j, I want to sample a number between the value given in the start column and the value given in the end column, placing that value in the sampled column for that row.

The results should look something like this:

ID  start  end  sampled
a   25     67   44
b   36     97   67
c   23     85   77
d   15     67   52
e   21     52   41
f   43     72   66
g   39     55   49
h   27     62   35
i   11     99   17
j   21     89   66
k   28     65   48
l   44     58   48
m   16     77   22
n   25     88   65

I started using mapply, which samples the whole df, but then I'm trying to fit all 15 sampled values into a single row.

df[j,4] <- mapply(function(x, y) sample(seq(x, y), 1), df$start, df$end)

I thought maybe something using seq might work, but this results in errors saying that from must be of length 1.

df[j,4] <- sample(seq(df$start, df$end),1,replace=TRUE)

The outer looping structure is pretty complicated so I haven't included it here, but the df[j,4] part of the code is necessary because it is part of a larger loop. There are situations where rows have to be resampled based on additional dependencies in the actual dataset. For example, the sampled value of a might need to be larger than b. The rest of the code updates the sampled column, checks for dependencies, and will rerun the sample if the dependencies aren't met. If I can get this sampling section to work, I should be able to plug it in without too much trouble (I hope).

Here's a sample data set.

structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h", 
"i", "j", "k", "l", "m", "n"), start = c(25, 36, 23, 15, 21, 
43, 39, 27, 11, 21, 28, 44, 16, 25), end = c(67, 97, 85, 67, 
52, 72, 55, 62, 99, 89, 65, 58, 77, 88), sampled = c(NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L), spec = structure(list(
    cols = list(ID = structure(list(), class = c("collector_character", 
    "collector")), start = structure(list(), class = c("collector_double", 
    "collector")), end = structure(list(), class = c("collector_double", 
    "collector")), sampled = structure(list(), class = c("collector_logical", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))```

回答1:


First, put the data in a format that is easier to use with dput(df):

df <- structure(list(ID = structure(1:14, .Label = c("a", "b", "c", 
    "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n"), class = "factor"), 
    start = c(25L, 36L, 23L, 15L, 21L, 43L, 39L, 27L, 11L, 21L, 
    28L, 44L, 16L, 25L), end = c(67L, 97L, 85L, 67L, 52L, 72L, 
    55L, 62L, 99L, 89L, 65L, 58L, 77L, 88L), sampled = c(44L, 
    67L, 77L, 52L, 41L, 66L, 49L, 35L, 17L, 66L, 48L, 48L, 22L, 
    65L)), class = "data.frame", row.names = c(NA, -14L))

You were very close with mapply() but you made it harder than it needs to be:

df$sampled <- mapply(function(x, y) sample(seq(x, y), 1), df$start, df$end)
df
#    ID start end sampled
# 1   a    25  67      67
# 2   b    36  97      86
# 3   c    23  85      54
# 4   d    15  67      36
# 5   e    21  52      37
# 6   f    43  72      60
# 7   g    39  55      44
# 8   h    27  62      37
# 9   i    11  99      86
# 10  j    21  89      52
# 11  k    28  65      65
# 12  l    44  58      51
# 13  m    16  77      62
# 14  n    25  88      31



回答2:


You might not need to loop through. If you want need is something between start and end, it's almost equivalent to sampling something between 0-1 and multiplying that by the range.

df %>% mutate(sampled = start + round((end-start)*runif(nrow(.))))

Regarding the updating, dependencies you mentioned in your comment: sounds a bit complicated. Quick thought: Might be faster to sample a lot of times and choose one that fits your criteria.




回答3:


Figured it out. df[j,4] <- mapply(function(x, y) sample(seq(x, y), 1), df[j,"start"], df[j,"end"])

I just needed to be specific as to which row of the sampled values I wanted to enter into df[j,4]. Specifying row j for columns start and end did the trick.



来源:https://stackoverflow.com/questions/58653809/sample-using-start-and-end-values-within-a-loop-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!