random-sample | 易学教程

Fast random weighted selection across all rows of a stochastic matrix

阅读更多关于 Fast random weighted selection across all rows of a stochastic matrix

问题 numpy.random.choice allows for weighted selection from a vector, i.e. arr = numpy.array([1, 2, 3]) weights = numpy.array([0.2, 0.5, 0.3]) choice = numpy.random.choice(arr, p=weights) selects 1 with probability 0.2, 2 with probability 0.5, and 3 with probability 0.3. What if we wanted to do this quickly in a vectorized fashion for a 2D array (matrix) for which each of the rows are a vector of probabilities? That is, we want a vector of choices from a stochastic matrix? This is the super slow

Python random lines from subfolders

阅读更多关于 Python random lines from subfolders

问题 I have many tasks in .txt files in multiple sub folders. I am trying to pick up a total 10 tasks randomly from these folders, their contained files and finally a text line within a file. The selected line should be deleted or marked so it will be not picked in the next execution. This may be too broad a question but I'd appreciate any input or direction. Here's the code I have so far: #!/usr/bin/python import random with open('C:\\Tasks\\file.txt') as f: lines = random.sample(f.readlines(),10

JavaScript - How to randomly sample items without replacement?

阅读更多关于 JavaScript - How to randomly sample items without replacement?

JavaScript I've tried searching for something like this, but I am not able to find it. It's a simple idea: a. Take a random number between 0 to 10. b. Let's say the random number rolled is a 3. c. Then, save the number (the 3). d. Now, take another random number again between 0 to 10, but it can't be the 3, because it has already appeared. One solution is to generate an array (a "bucket") with all the values you want to pick, in this case all numbers from 0 to 10. Then you pick one randomly from the array and remove it from the bucket. Note that the example below doesn't check if the bucket is

Generate sample of 1,000,000 random permutations

阅读更多关于 Generate sample of 1,000,000 random permutations

问题 I am working with large number of integer permutations. The number of elements in each permutation is K. The element size is 1 byte. I need to generate N unique random permutations. Constraints: K <= 144, N <= 1,000,000. I came up with the following straightforward algorithm: Generate list of N random permutations. Store all permutations in RAM. Sort the list and delete all duplicates (if any). The number of duplicates will be relatively small. If there were any duplicates, add random

Random Sample of a subset of a dataframe in Pandas

阅读更多关于 Random Sample of a subset of a dataframe in Pandas

问题 Say i have a dataframe with 100,000 entries and want to split it into 100 sections of 1000 entries. How do i take a random sample of say size 50 of just one of the 100 sections. the data set is already ordered such that the first 1000 results are the first section the next section the next and so on. many thanks 回答1: You can use the sample method*: In [11]: df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], columns=["A", "B"]) In [12]: df.sample(2) Out[12]: A B 0 1 2 2 5 6 In [13]: df.sample

from data table, randomly select one row per group

阅读更多关于 from data table, randomly select one row per group

问题 I'm looking for an efficient way to select rows from a data table such that I have one representative row for each unique value in a particular column. Let me propose a simple example: require(data.table) y = c('a','b','c','d','e','f','g','h') x = sample(2:10,8,replace = TRUE) z = rep(y,x) dt = as.data.table( z ) my objective is to subset data table dt by sampling one row for each letter a-h in column z. 回答1: OP provided only a single column in the example. Assuming that there are multiple

Select a random sample of results from a query result

阅读更多关于 Select a random sample of results from a query result

问题 This question asks about getting a random(ish) sample of records on SQL Server and the answer was to use TABLESAMPLE . Is there an equivalent in Oracle 10? If there isn't, is there a standard way to get a random sample of results from a query set? For example how can one get 1,000 random rows from a query that will return millions normally? 回答1: SELECT * FROM ( SELECT * FROM mytable ORDER BY dbms_random.value ) WHERE rownum <= 1000 回答2: The SAMPLE clause will give you a random sample

Simple Random Samples from a Sql database

阅读更多关于 Simple Random Samples from a Sql database

How do I take an efficient simple random sample in SQL? The database in question is running MySQL; my table is at least 200,000 rows, and I want a simple random sample of about 10,000. The "obvious" answer is to: SELECT * FROM table ORDER BY RAND() LIMIT 10000 For large tables, that's too slow: it calls RAND() for every row (which already puts it at O(n)), and sorts them, making it O(n lg n) at best. Is there a way to do this faster than O(n)? Note : As Andrew Mao points out in the comments, If you're using this approach on SQL Server, you should use the T-SQL function NEWID(), because RAND()

Binary random array with a specific proportion of ones?

阅读更多关于 Binary random array with a specific proportion of ones?

问题 What is the efficient(probably vectorized with Matlab terminology) way to generate random number of zeros and ones with a specific proportion? Specially with Numpy? As my case is special for 1/3 , my code is: import numpy as np a=np.mod(np.multiply(np.random.randomintegers(0,2,size)),3) But is there any built-in function that could handle this more effeciently at least for the situation of K/N where K and N are natural numbers? 回答1: Yet another approach, using np.random.choice: >>> np.random

JavaScript - How to randomly sample items without replacement?

阅读更多关于 JavaScript - How to randomly sample items without replacement?

问题 JavaScript I\'ve tried searching for something like this, but I am not able to find it. It\'s a simple idea: a. Take a random number between 0 to 10. b. Let\'s say the random number rolled is a 3. c. Then, save the number (the 3). d. Now, take another random number again between 0 to 10, but it can\'t be the 3, because it has already appeared. 回答1: One solution is to generate an array (a "bucket") with all the values you want to pick, in this case all numbers from 0 to 10. Then you pick one