random-sample | 易学教程

Randomly sample data frame into 3 groups in R

阅读更多关于 Randomly sample data frame into 3 groups in R

问题 Objective: Randomly divide a data frame into 3 samples. one sample with 60% of the rows other two samples with 20% of the rows samples should not have duplicates of others (i.e. sample without replacement). Here's a clunky solution: allrows <- 1:nrow(mtcars) set.seed(7) trainrows <- sample(allrows, replace = F, size = 0.6*length(allrows)) test_cvrows <- allrows[-trainrows] testrows <- sample(test_cvrows, replace=F, size = 0.5*length(test_cvrows)) cvrows <- test_cvrows[-which(test_cvrows %in%

Randomly Pick Lines From a File Without Slurping It With Unix

阅读更多关于 Randomly Pick Lines From a File Without Slurping It With Unix

问题 I have a 10^7 lines file, in which I want to choose 1/100 of lines randomly from the file. This is the AWK code I have, but it slurps all the file content before hand. My PC memory cannot handle such slurps. Is there other approach to do it? awk 'BEGIN{srand()} !/^$/{ a[c++]=$0} END { for ( i=1;i<=c ;i++ ) { num=int(rand() * c) if ( a[num] ) { print a[num] delete a[num] d++ } if ( d == c/100 ) break } }' file 回答1: if you have that many lines, are you sure you want exactly 1% or a statistical

How to improve this function to test Underscore Sample `_.sample()`

阅读更多关于 How to improve this function to test Underscore Sample `_.sample()`

问题 The setup: I am using _.sample() for my first time in some client's code and I wanted to test it to make sure that it produced an even distribution of samples time after time. To test this, I created the following code: (function(arraySize, timesToRun){ arraySize = arraySize || 10; timesToRun = timesToRun || 1000; let myArray = Array.apply(null, {length: arraySize}).map(Number.call, Number); let resultsArray = Array.apply(null, Array(arraySize)).map(Number.prototype.valueOf,0); for (let i = 0

R: Random sampling an even number of observations from a range of categories

阅读更多关于 R: Random sampling an even number of observations from a range of categories

问题 I previously took a random sample of postcodes from my dataframe and then realised that I wasn't sampling across all higher level statistical units. I have around 1 million postcodes and 7000 middle output statistical units. I want the sample to have roughly the same number of postcodes from each statistical unit. How do I randomly sample 35 postcodes from each higher level statistical unit? I used the following code previously to randomly sample 250,000 postcodes: total.sample <- total

What does the integer while setting the seed mean?

阅读更多关于 What does the integer while setting the seed mean?

问题 I want to randomly select n rows from my data set using the sample() function in R . I was getting different outputs each time and hence used set.seed() function to get the same output. I know that each integer in the set.seed() will give me a unique output and the output will be the same if set the same seed. But I'm not able to make out what that integer that is passed as a parameter to the set.seed() function means. Is it just an index that goes into the random generator algorithm or does

Rcpp R sample equivalent from a NumericVector

阅读更多关于 Rcpp R sample equivalent from a NumericVector

问题 I have created a NumericVector and I need to sample one random Integer from it. I tried to use various RcppArmarillo functions but it failed to works for me. The function is below: //#include <algorithm> #include <RcppArmadilloExtensions/sample.h> using namespace Rcpp; using namespace arma; using namespace std; int simulateNextStepC(double currentAmount, double lastPaid, int currentStatus, int currentMaturity, NumericMatrix amountLinkMatrix, NumericMatrix statusMatrix, double

Program simple simulation in R

阅读更多关于 Program simple simulation in R

问题 Editing this post for simplification according to @agstudy I am trying to develop a model that simulates a polymer using a random uniform distribution. The model has 2 states State 1 (probability of state 1 if in state 2 is .003): growth probability, A = .01 shrink probability, B = .0025 State 2 (probability of state 2 if in state 1 is .0003): growth probability, A = .01 shrink probability, E = .05 Simulation starts in State 1 While in State 1, sample random numbers from data.frame1, if # <

Taking Sample in SQL Query

阅读更多关于 Taking Sample in SQL Query

问题 I'm working on a problem which is something like this : I have a table with many columns but major are DepartmentId and EmployeeIds Employee Ids Department Ids ------------------------------ A 1 B 1 C 1 D 1 AA 2 BB 2 CC 2 A1 3 B1 3 C1 3 D1 3 I want to write a SQL query such that I take out 2 sample EmployeeIds for each DepartmentID . like Employee Id Dept Ids B 1 C 1 AA 2 CC 2 D1 3 A1 3 Currently I am writing the query, select EmployeeId, DeptIds, count(*) from table_name group by 1,2 sample

Block sampling according to index in panel data

阅读更多关于 Block sampling according to index in panel data

问题 I have a panel data, i.e. t rows for each of n observations ( nxt ), such as data("Grunfeld", package="plm") head(Grunfeld) firm year inv value capital 1 1935 317.6 3078.5 2.8 1 1936 391.8 4661.7 52.6 1 1937 410.6 5387.1 156.9 2 1935 257.7 2792.2 209.2 2 1936 330.8 4313.2 203.4 2 1937 461.2 4643.9 207.2 I want to make block bootstrapping, i.e. I want resample with replacement, taking a firm [i] with all the years in which it is observed. For instance, if year=1935:1937 and firm 1 is randomly

how to generate integer inter arrival times using random.expovariate() in python

阅读更多关于 how to generate integer inter arrival times using random.expovariate() in python

问题 In python random module, the expovariate() function generates floating point numbers which can be used to model inter-arrival times of a Poisson process. How do I make use of this to generate integer times between arrival instead of floating point numbers? 回答1: jonrsharpe already kind of mentioned it, you can just let the function generate floating point numbers, and convert the output to integers yourself using int() This >>> import random >>> [random.expovariate(0.2) for i in range(10)] [7