random-sample

Randomly sample data frame into 3 groups in R

雨燕双飞 提交于 2019-12-18 05:14:08
问题 Objective: Randomly divide a data frame into 3 samples. one sample with 60% of the rows other two samples with 20% of the rows samples should not have duplicates of others (i.e. sample without replacement). Here's a clunky solution: allrows <- 1:nrow(mtcars) set.seed(7) trainrows <- sample(allrows, replace = F, size = 0.6*length(allrows)) test_cvrows <- allrows[-trainrows] testrows <- sample(test_cvrows, replace=F, size = 0.5*length(test_cvrows)) cvrows <- test_cvrows[-which(test_cvrows %in%

Randomly Pick Lines From a File Without Slurping It With Unix

放肆的年华 提交于 2019-12-17 17:24:44
问题 I have a 10^7 lines file, in which I want to choose 1/100 of lines randomly from the file. This is the AWK code I have, but it slurps all the file content before hand. My PC memory cannot handle such slurps. Is there other approach to do it? awk 'BEGIN{srand()} !/^$/{ a[c++]=$0} END { for ( i=1;i<=c ;i++ ) { num=int(rand() * c) if ( a[num] ) { print a[num] delete a[num] d++ } if ( d == c/100 ) break } }' file 回答1: if you have that many lines, are you sure you want exactly 1% or a statistical

How to improve this function to test Underscore Sample `_.sample()`

一笑奈何 提交于 2019-12-14 03:35:08
问题 The setup: I am using _.sample() for my first time in some client's code and I wanted to test it to make sure that it produced an even distribution of samples time after time. To test this, I created the following code: (function(arraySize, timesToRun){ arraySize = arraySize || 10; timesToRun = timesToRun || 1000; let myArray = Array.apply(null, {length: arraySize}).map(Number.call, Number); let resultsArray = Array.apply(null, Array(arraySize)).map(Number.prototype.valueOf,0); for (let i = 0

R: Random sampling an even number of observations from a range of categories

坚强是说给别人听的谎言 提交于 2019-12-13 16:28:47
问题 I previously took a random sample of postcodes from my dataframe and then realised that I wasn't sampling across all higher level statistical units. I have around 1 million postcodes and 7000 middle output statistical units. I want the sample to have roughly the same number of postcodes from each statistical unit. How do I randomly sample 35 postcodes from each higher level statistical unit? I used the following code previously to randomly sample 250,000 postcodes: total.sample <- total

What does the integer while setting the seed mean?

醉酒当歌 提交于 2019-12-13 11:34:21
问题 I want to randomly select n rows from my data set using the sample() function in R . I was getting different outputs each time and hence used set.seed() function to get the same output. I know that each integer in the set.seed() will give me a unique output and the output will be the same if set the same seed. But I'm not able to make out what that integer that is passed as a parameter to the set.seed() function means. Is it just an index that goes into the random generator algorithm or does

Rcpp R sample equivalent from a NumericVector

拥有回忆 提交于 2019-12-13 08:53:47
问题 I have created a NumericVector and I need to sample one random Integer from it. I tried to use various RcppArmarillo functions but it failed to works for me. The function is below: //#include <algorithm> #include <RcppArmadilloExtensions/sample.h> using namespace Rcpp; using namespace arma; using namespace std; int simulateNextStepC(double currentAmount, double lastPaid, int currentStatus, int currentMaturity, NumericMatrix amountLinkMatrix, NumericMatrix statusMatrix, double

Program simple simulation in R

╄→гoц情女王★ 提交于 2019-12-13 06:22:51
问题 Editing this post for simplification according to @agstudy I am trying to develop a model that simulates a polymer using a random uniform distribution. The model has 2 states State 1 (probability of state 1 if in state 2 is .003): growth probability, A = .01 shrink probability, B = .0025 State 2 (probability of state 2 if in state 1 is .0003): growth probability, A = .01 shrink probability, E = .05 Simulation starts in State 1 While in State 1, sample random numbers from data.frame1, if # <

Taking Sample in SQL Query

让人想犯罪 __ 提交于 2019-12-13 03:59:26
问题 I'm working on a problem which is something like this : I have a table with many columns but major are DepartmentId and EmployeeIds Employee Ids Department Ids ------------------------------ A 1 B 1 C 1 D 1 AA 2 BB 2 CC 2 A1 3 B1 3 C1 3 D1 3 I want to write a SQL query such that I take out 2 sample EmployeeIds for each DepartmentID . like Employee Id Dept Ids B 1 C 1 AA 2 CC 2 D1 3 A1 3 Currently I am writing the query, select EmployeeId, DeptIds, count(*) from table_name group by 1,2 sample

Block sampling according to index in panel data

爱⌒轻易说出口 提交于 2019-12-12 15:14:50
问题 I have a panel data, i.e. t rows for each of n observations ( nxt ), such as data("Grunfeld", package="plm") head(Grunfeld) firm year inv value capital 1 1935 317.6 3078.5 2.8 1 1936 391.8 4661.7 52.6 1 1937 410.6 5387.1 156.9 2 1935 257.7 2792.2 209.2 2 1936 330.8 4313.2 203.4 2 1937 461.2 4643.9 207.2 I want to make block bootstrapping, i.e. I want resample with replacement, taking a firm [i] with all the years in which it is observed. For instance, if year=1935:1937 and firm 1 is randomly

how to generate integer inter arrival times using random.expovariate() in python

眉间皱痕 提交于 2019-12-12 12:27:09
问题 In python random module, the expovariate() function generates floating point numbers which can be used to model inter-arrival times of a Poisson process. How do I make use of this to generate integer times between arrival instead of floating point numbers? 回答1: jonrsharpe already kind of mentioned it, you can just let the function generate floating point numbers, and convert the output to integers yourself using int() This >>> import random >>> [random.expovariate(0.2) for i in range(10)] [7