Numpy: Get random set of rows from 2D array

后端未结

关注

 8  1952

I have a very large 2D array which looks something like this:

a=
[[a1, b1, c1],
 [a2, b2, c2],
 ...,
 [an, bn, cn]]

Using numpy, is there a

相关标签:

8条回答

轮回少年

2020-11-28 02:31
I see permutation has been suggested. In fact it can be made into one line:
```
>>> A = np.random.randint(5, size=(10,3))
>>> np.random.permutation(A)[:2]

array([[0, 3, 0],
       [3, 1, 2]])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

忘掉有多难

2020-11-28 02:31

If you want to generate multiple random subsets of rows, for example if your doing RANSAC.

num_pop = 10
num_samples = 2
pop_in_sample = 3
rows_to_sample = np.random.random([num_pop, 5])
random_numbers = np.random.random([num_samples, num_pop])
samples = np.argsort(random_numbers, axis=1)[:, :pop_in_sample]
# will be shape [num_samples, pop_in_sample, 5]
row_subsets = rows_to_sample[samples, :]

0 讨论(0)

你的背包

2020-11-28 02:34

>>> A = np.random.randint(5, size=(10,3))
>>> A
array([[1, 3, 0],
       [3, 2, 0],
       [0, 2, 1],
       [1, 1, 4],
       [3, 2, 2],
       [0, 1, 0],
       [1, 3, 1],
       [0, 4, 1],
       [2, 4, 2],
       [3, 3, 1]])
>>> idx = np.random.randint(10, size=2)
>>> idx
array([7, 6])
>>> A[idx,:]
array([[0, 4, 1],
       [1, 3, 1]])

Putting it together for a general case:

A[np.random.randint(A.shape[0], size=2), :]

For non replacement (numpy 1.7.0+):

A[np.random.choice(A.shape[0], 2, replace=False), :]

I do not believe there is a good way to generate random list without replacement before 1.7. Perhaps you can setup a small definition that ensures the two values are not the same.

0 讨论(0)

误落风尘

2020-11-28 02:40
This is an old post, but this is what works best for me:
```
A[np.random.choice(A.shape[0], num_rows_2_sample, replace=False)]
```
change the replace=False to True to get the same thing, but with replacement.
0 讨论(0)
发布评论:

提交评论
- 加载中...
-上瘾入骨i

2020-11-28 02:42
Another option is to create a random mask if you just want to down-sample your data by a certain factor. Say I want to down-sample to 25% of my original data set, which is currently held in the array data_arr:
```
# generate random boolean mask the length of data
# use p 0.75 for False and 0.25 for True
mask = numpy.random.choice([False, True], len(data_arr), p=[0.75, 0.25])
```
Now you can call data_arr[mask] and return ~25% of the rows, randomly sampled.
0 讨论(0)
发布评论:

提交评论
- 加载中...

隐瞒了意图╮

2020-11-28 02:46

An alternative way of doing it is by using the choice method of the Generator class, https://github.com/numpy/numpy/issues/10835

import numpy as np

# generate the random array
A = np.random.randint(5, size=(10,3))

# use the choice method of the Generator class
rng = np.random.default_rng()
A_sampled = rng.choice(A, 2)

leading to a sampled data,

array([[1, 3, 2],
       [1, 2, 1]])

The running time is also profiled compared as follows,

%timeit rng.choice(A, 2)
15.1 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.random.permutation(A)[:2]
4.22 µs ± 83.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit A[np.random.randint(A.shape[0], size=2), :]
10.6 µs ± 418 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

But when the array goes big, A = np.random.randint(10, size=(1000,300)). working on the index is the best way.

%timeit A[np.random.randint(A.shape[0], size=50), :]
17.6 µs ± 657 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit rng.choice(A, 50)
22.3 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit np.random.permutation(A)[:50]
143 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So the permutation method seems to be the most efficient one when your array is small while working on the index is the optimal solution when your array goes big.

0 讨论(0)

1 2 下一页