random-sample

Randomly split data by criterion into training and testing data set using R

断了今生、忘了曾经 提交于 2019-11-29 12:26:33
Gidday, I'm looking for a way to randomly split a data frame (e.g. 90/10 split) for testing and training of a model keeping a certain grouping criteria. Imagine I have a data frame like this: > test[1:20,] companycode year expenses 1 C1 1 8.47720 2 C1 2 8.45250 3 C1 3 8.46280 4 C2 1 14828.90603 5 C3 1 665.21565 6 C3 2 290.66596 7 C3 3 865.56265 8 C3 4 6785.03586 9 C3 5 312.02617 10 C3 6 760.48740 11 C3 7 1155.76758 12 C4 1 4565.78313 13 C4 2 3340.36540 14 C4 3 2656.73030 15 C4 4 1079.46098 16 C5 1 60.57039 17 C6 1 6282.48118 18 C6 2 7419.32720 19 C7 1 644.90571 20 C8 1 58332.34945 What I'm

Sampling from Oracle, Need exact number of results (Sample Clause)

℡╲_俬逩灬. 提交于 2019-11-29 11:00:20
I am trying to pull a random sample of a population from a Peoplesoft Database. The searches online have lead me to think that the Sample Clause of the select statement may be a viable option for us to use, however I am having trouble understanding how the Sample clause determines the number of samples returned. I have looked at the oracle documentation found here: http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_10002.htm#i2065953 But the above reference only talks about the syntax used to create the sample. The reason for my question is I need to understand how the sample

Random Sample of a subset of a dataframe in Pandas

两盒软妹~` 提交于 2019-11-29 09:07:38
Say i have a dataframe with 100,000 entries and want to split it into 100 sections of 1000 entries. How do i take a random sample of say size 50 of just one of the 100 sections. the data set is already ordered such that the first 1000 results are the first section the next section the next and so on. many thanks You can use the sample method*: In [11]: df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], columns=["A", "B"]) In [12]: df.sample(2) Out[12]: A B 0 1 2 2 5 6 In [13]: df.sample(2) Out[13]: A B 3 7 8 0 1 2 * On one of the section DataFrames. Note: If you have a larger sample size that

Fixing set.seed for an entire session

我怕爱的太早我们不能终老 提交于 2019-11-29 01:10:28
I am using R to construct an agent based model with a monte carlo process. This means I got many functions that use a random engine of some kind. In order to get reproducible results, I must fix the seed. But, as far as I understand, I must set the seed before every random draw or sample. This is a real pain in the neck. Is there a way to fix the seed? set.seed(123) print(sample(1:10,3)) # [1] 3 8 4 print(sample(1:10,3)) # [1] 9 10 1 set.seed(123) print(sample(1:10,3)) # [1] 3 8 4 There are several options, depending on your exact needs. I suspect the first option, the simplest is not

Weighted random sampling in Elasticsearch

孤人 提交于 2019-11-28 23:49:07
I need to obtain a random sample from an ElasticSearch index, i.e. to issue a query that retrieves some documents from a given index with weighted probability Wj/ΣWi (where Wj is a weight of row j and Wj/ΣWi is a sum of weights of all documents in this query). Currently, I have the following query: GET products/_search?pretty=true {"size":5, "query": { "function_score": { "query": { "bool":{ "must": { "term": {"category_id": "5df3ab90-6e93-0133-7197-04383561729e"} } } }, "functions": [{"random_score":{}}] } }, "sort": [{"_score":{"order":"desc"}}] } It returns 5 items from selected category,

Generate a random sample of points distributed on the surface of a unit sphere

情到浓时终转凉″ 提交于 2019-11-28 23:25:11
I am trying to generate random points on the surface of the sphere using numpy. I have reviewed the post that explains uniform distribution here . However, need ideas on how to generate the points only on the surface of the sphere. I have coordinates (x, y, z) and the radius of each of these spheres. I am not very well-versed with Mathematics at this level and trying to make sense of the Monte Carlo simulation. Any help will be much appreciated. Thanks, Parin ali_m Based on the last approach on this page , you can simply generate a vector consisting of independent samples from three standard

Iterator to produce unique random order?

我怕爱的太早我们不能终老 提交于 2019-11-28 11:47:32
The problem is stated as follows, we have a very large number of items which are traversed through an iterator pattern (which dynamicaly constructs or fetches) the requested item. Due to the number of items being large and thus cannot be kept in memory (as a list for example). What is a procedure for the iterator to follow in order to produce a random order of the items each time the iterator is called. A unique random order means that eventually all items are traversed only once but returned in a random order . if the number of items is relatively small, one can solve this problem as follows:

Fast random weighted selection across all rows of a stochastic matrix

…衆ロ難τιáo~ 提交于 2019-11-28 09:59:15
numpy.random.choice allows for weighted selection from a vector, i.e. arr = numpy.array([1, 2, 3]) weights = numpy.array([0.2, 0.5, 0.3]) choice = numpy.random.choice(arr, p=weights) selects 1 with probability 0.2, 2 with probability 0.5, and 3 with probability 0.3. What if we wanted to do this quickly in a vectorized fashion for a 2D array (matrix) for which each of the rows are a vector of probabilities? That is, we want a vector of choices from a stochastic matrix? This is the super slow way: import numpy as np m = 10 n = 100 # Or some very large number items = np.arange(m) prob_weights =

Python random lines from subfolders

给你一囗甜甜゛ 提交于 2019-11-28 09:28:23
I have many tasks in .txt files in multiple sub folders. I am trying to pick up a total 10 tasks randomly from these folders, their contained files and finally a text line within a file. The selected line should be deleted or marked so it will be not picked in the next execution. This may be too broad a question but I'd appreciate any input or direction. Here's the code I have so far: #!/usr/bin/python import random with open('C:\\Tasks\\file.txt') as f: lines = random.sample(f.readlines(),10) print(lines) Martijn Pieters To get a proper random distribution across all these files, you'd need

Numpy drawing from urn

ε祈祈猫儿з 提交于 2019-11-28 04:17:26
问题 I want to run a relatively simple random draw in numpy, but I can't find a good way to express it. I think the best way is to describe it as drawing from an urn without replacement. I have an urn with k colors, and n_k balls of every color. I want to draw m balls, and know how many balls of every color I have. My current attempt it np.bincount(np.random.permutation(np.repeat(np.arange(k), n_k))[:m], minlength=k) here, n_k is an array of length k with the counts of the balls. It seems that's