random-sample

Weighted sampling without replacement

ぐ巨炮叔叔 提交于 2019-12-01 13:22:04
I have a population p of indices and corresponding weights in vector w . I want to get k samples from this population without replacement where the selection is done proportional to the weights in random. I know that randsample can be used for selection with replacement by saying J = randsample(p,k,true,w) but when I call it with parameter false instead of true , I get ??? Error using ==> randsample at 184 Weighted sampling without replacement is not supported. I wrote my own function as discussed in here : p = 1:n; J = zeros(1,k); for i = 1:k J(i) = randsample(p,1,true,w); w(p == J(i)) = 0;

Stratified sample when some strata are too small

99封情书 提交于 2019-12-01 11:58:28
I need to draw a stratified sample with n observation in each stratum, but some strata have fewer observations than n . If a stratum has too few observations (say, k<n observations), I want to sample all k observations from that stratum. require(sampling) n <- 10 geo_ID <- c(rep(1, times = 20), rep(2, times = 20), rep(c(1, 2, 3, 4), times = 5)) set.seed(42) V1 <- rnorm(60, 0, 1) V2 <- rnorm(60, 2, 1) DF <- data.frame(geo_ID = geo_ID, V1 = V1, V2 = V2) #Sort as explained in ?strata help file DF <- DF[order(DF[, "geo_ID"]), ] strata(DF, stratanames = "geo_ID", size = c(n, n, n, n), method =

Weighted sampling without replacement

[亡魂溺海] 提交于 2019-12-01 11:02:31
问题 I have a population p of indices and corresponding weights in vector w . I want to get k samples from this population without replacement where the selection is done proportional to the weights in random. I know that randsample can be used for selection with replacement by saying J = randsample(p,k,true,w) but when I call it with parameter false instead of true , I get ??? Error using ==> randsample at 184 Weighted sampling without replacement is not supported. I wrote my own function as

Stratified sample when some strata are too small

你。 提交于 2019-12-01 10:11:34
问题 I need to draw a stratified sample with n observation in each stratum, but some strata have fewer observations than n . If a stratum has too few observations (say, k<n observations), I want to sample all k observations from that stratum. require(sampling) n <- 10 geo_ID <- c(rep(1, times = 20), rep(2, times = 20), rep(c(1, 2, 3, 4), times = 5)) set.seed(42) V1 <- rnorm(60, 0, 1) V2 <- rnorm(60, 2, 1) DF <- data.frame(geo_ID = geo_ID, V1 = V1, V2 = V2) #Sort as explained in ?strata help file

Generating random numbers (0 and 1) given specific probability values in R

梦想的初衷 提交于 2019-12-01 07:17:05
问题 I could not find answer for this question in R. I would like to generate a random sample of 0 to 1's 'RandomSample'. For each sample I would like to have a specific number of values 'numval' which is derived from the length of the vector 'Prob'. 'Prob' is giving me probability value that each individual point will be 0 or 1. So in this instance first number will have prob value of 0.9 being 1, and 0.1 being 0. And so on. Then, I would like to repeat random sample generation 1000 times. I have

how to generate random numbers with a specified lognormal distribution in R?

只谈情不闲聊 提交于 2019-11-30 21:35:53
I would like to get 20 randomly generated numbers from a lognormal distribution with the geometric mean of 10 and geometric standard deviation of 2.5. Which R function should I use to accomplish this task? Thank you for your help! The rlnorm function: rlnorm(20, log(10), log(2.5)) More generally distributions in R are generally available in d,p,q,r forms with those letters coming first followed by the distribution stem: norm , lnorm , unif , gamma , ... etc. Their help pages will contain the specifics of the parameters, which can be essential if working with weibull or other distribution for

Generating Random Date time in java (joda time)

我的梦境 提交于 2019-11-30 19:24:23
Is it possible to generate a random datetime using Jodatime such that the datetime has the format yyyy-MM-dd HH:MM:SS and it should be able to generate two random datetimes where Date2 minus Date1 will be greater than 2 minutes but less than 60minutes. Please suggest some method. This follows quite strictly what you asked for (except for the corrected format). Random random = new Random(); DateTime startTime = new DateTime(random.nextLong()).withMillisOfSecond(0); Minutes minimumPeriod = Minutes.TWO; int minimumPeriodInSeconds = minimumPeriod.toStandardSeconds().getSeconds(); int

pandas create a series with n elements (sequential or randbetween)

霸气de小男生 提交于 2019-11-30 19:17:17
I am trying to create a pandas series. One column of the series should contain n sequential numbers. [1, 2, 3, ..., n] One column should contain random numbers between k and k+100 . One column should contain random selection between strings in a list. ['A', 'B', 'C', ... 'Z'] jezrael There can be a lot of solutions. In the comments of the code block ( # ) you will find a few links for more information: import pandas as pd import numpy as np import random import string k = 5 N = 10 #http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randint.html #http://stackoverflow.com/a/2257449

What does the integer while setting the seed mean?

余生颓废 提交于 2019-11-30 08:51:06
I want to randomly select n rows from my data set using the sample() function in R . I was getting different outputs each time and hence used set.seed() function to get the same output. I know that each integer in the set.seed() will give me a unique output and the output will be the same if set the same seed. But I'm not able to make out what that integer that is passed as a parameter to the set.seed() function means. Is it just an index that goes into the random generator algorithm or does it mean some part of the data from where you start sampling? For example, what does the 2 in set.seed(2

Select n records at random from a set of N

前提是你 提交于 2019-11-29 15:27:09
I need to select n records at random from a set of N (where 0 < n < N ). A possible algorithm is: Iterate through the list and for each element, make the probability of selection = (number needed) / (number left) So if you had 40 items, the first would have a 5/40 chance of being selected. If it is, the next has a 4/39 chance, otherwise it has a 5/39 chance. By the time you get to the end you will have your 5 items, and often you'll have all of them before that. Assuming a good pseudo-random number generator, is this algorithm correct? NOTE There're many questions of this kind on stackoverflow