How to generate random pairs of numbers in Python, including pairs with one entry being the same and excluding pairs with both entries being the same?

前端 未结 3 1213
北恋
北恋 2021-02-06 07:26

I\'m using Python and was using numpy for this. I want to generate pairs of random numbers. I want to exclude repetitive outcomes of pairs with both entries being the s

相关标签:
3条回答
  • 2021-02-06 07:31

    Generator random unique coordinates:

    from random import randint
    
    def gencoordinates(m, n):
        seen = set()
    
        x, y = randint(m, n), randint(m, n)
    
        while True:
            seen.add((x, y))
            yield (x, y)
            x, y = randint(m, n), randint(m, n)
            while (x, y) in seen:
                x, y = randint(m, n), randint(m, n)
    

    Output:

    >>> g = gencoordinates(1, 100)
    >>> next(g)
    (42, 98)
    >>> next(g)
    (9, 5)
    >>> next(g)
    (89, 29)
    >>> next(g)
    (67, 56)
    >>> next(g)
    (63, 65)
    >>> next(g)
    (92, 66)
    >>> next(g)
    (11, 46)
    >>> next(g)
    (68, 21)
    >>> next(g)
    (85, 6)
    >>> next(g)
    (95, 97)
    >>> next(g)
    (20, 6)
    >>> next(g)
    (20, 86)
    

    As you can see coincidentally an x coordinate was repeated!

    0 讨论(0)
  • 2021-02-06 07:40

    @James Miles answer is great, but just to avoid endless loops when accidentally asking for too many arguments I suggest the following (it also removes some repetitions):

    def gencoordinates(m, n):
        seen = set()
        x, y = randint(m, n), randint(m, n)
        while len(seen) < (n + 1 - m)**2:
            while (x, y) in seen:
                x, y = randint(m, n), randint(m, n)
            seen.add((x, y))
            yield (x, y)
        return
    

    Note that wrong range of values will still propagate down.

    0 讨论(0)
  • 2021-02-06 07:52

    Let's say that your x and y coordinates are all integers between 0 and n. For small n a simple method might be to generate the set of all possible xy coordinates using np.mgrid, reshape it to a (nx * ny, 2) array, then sample random rows from this:

    nx, ny = 100, 200
    xy = np.mgrid[:nx,:ny].reshape(2, -1).T
    sample = xy.take(np.random.choice(xy.shape[0], 100, replace=False), axis=0)
    

    Creating the array of all possible coordinates can become expensive if nx and/or ny is very large, in which case it might be better to use a generator object and keep track of previously used coordinates, as in James' answer.


    Following @morningsun's suggestion, an alternative method is to sample from the set of nx*ny indices into the flattened array then convert these directly to x, y coordinates, which avoids constructing the whole nx*ny array of possible x, y permutations.

    For comparison, here's a version of my original approach generalized for N-dimensional arrays, plus a version that uses the new approach:

    def sample_comb1(dims, nsamp):
        perm = np.indices(dims).reshape(len(dims), -1).T
        idx = np.random.choice(perm.shape[0], nsamp, replace=False)
        return perm.take(idx, axis=0)
    
    def sample_comb2(dims, nsamp):
        idx = np.random.choice(np.prod(dims), nsamp, replace=False)
        return np.vstack(np.unravel_index(idx, dims)).T
    

    There's not a huge difference in practice, but the benefits of the second method become a bit more apparent for larger arrays:

    In [1]: %timeit sample_comb1((100, 200), 100)
    100 loops, best of 3: 2.59 ms per loop
    
    In [2]: %timeit sample_comb2((100, 200), 100)
    100 loops, best of 3: 2.4 ms per loop
    
    In [3]: %timeit sample_comb1((1000, 2000), 100)
    1 loops, best of 3: 341 ms per loop
    
    In [4]: %timeit sample_comb2((1000, 2000), 100)
    1 loops, best of 3: 319 ms per loop
    


    If you have scikit-learn installed, sklearn.utils.random.sample_without_replacement offers a much faster method for generating random indices without replacement using Floyd's algorithm:

    from sklearn.utils.random import sample_without_replacement
    
    def sample_comb3(dims, nsamp):
        idx = sample_without_replacement(np.prod(dims), nsamp)
        return np.vstack(np.unravel_index(idx, dims)).T
    
    In [5]: %timeit sample_comb3((1000, 2000), 100)
    The slowest run took 4.49 times longer than the fastest. This could mean that an
    intermediate result is being cached 
    10000 loops, best of 3: 53.2 µs per loop
    
    0 讨论(0)
提交回复
热议问题