Probability distribution in Python

后端 未结 12 2015
误落风尘
误落风尘 2020-12-02 08:12

I have a bunch of keys that each have an unlikeliness variable. I want to randomly choose one of these keys, yet I want it to be more unlikely for unlikely (key, values) to

相关标签:
12条回答
  • 2020-12-02 08:43

    I was needed in faster functions, for non very large numbers. So here it is, in Visual C++:

    #undef _DEBUG // disable linking with python25_d.dll
    #include <Python.h>
    #include <malloc.h>
    #include <stdlib.h>
    
    static PyObject* dieroll(PyObject *, PyObject *args)
    {
        PyObject *list;
        if (!PyArg_ParseTuple(args, "O:decompress", &list))
            return NULL;
    
        if (!PyList_Check(list)) 
            return PyErr_Format(PyExc_TypeError, "list of numbers expected ('%s' given)", list->ob_type->tp_name), NULL;
    
        int size = PyList_Size(list);
    
        if (size < 1)
            return PyErr_Format(PyExc_TypeError, "got empty list"), NULL;
    
        long *array = (long*)alloca(size*sizeof(long));
    
        long sum = 0;
        for (int i = 0; i < size; i++) {
            PyObject *o = PyList_GetItem(list, i);
    
            if (!PyInt_Check(o))
                return PyErr_Format(PyExc_TypeError, "list of ints expected ('%s' found)", o->ob_type->tp_name), NULL;
            long n = PyInt_AsLong(o);
            if (n == -1 && PyErr_Occurred())
                return NULL;
            if (n < 0)
                return PyErr_Format(PyExc_TypeError, "list of positive ints expected (negative found)"), NULL;
    
            sum += n; //NOTE: integer overflow
            array[i] = sum;
        }
    
        if (sum <= 0)
            return PyErr_Format(PyExc_TypeError, "sum of numbers is not positive"), NULL;
    
        int r = rand() * (sum-1) / RAND_MAX; //NOTE: rand() may be too small (0x7fff).    rand() * sum may result in integer overlow.
    
        assert(array[size-1] == sum);
        assert(r < sum && r < array[size-1]);
        for (int i = 0; i < size; ++i)
        {
            if (r < array[i])
                return PyInt_FromLong(i);
        }
        return PyErr_Format(PyExc_TypeError, "internal error."), NULL;
    }
    
    static PyMethodDef module_methods[] = 
    {
        {"dieroll", (PyCFunction)dieroll, METH_VARARGS, "random index, beased on weights" },
        {NULL}  /* Sentinel */
    };
    
    PyMODINIT_FUNC initdieroll(void) 
    {
        PyObject *module = Py_InitModule3("dieroll", module_methods, "dieroll");
        if (module == NULL)
            return;
    }
    
    0 讨论(0)
  • 2020-12-02 08:47

    I would use this recipe . You will need to add a weight to your objects, but that is just a simple ratio and put them in a list of tuples (object, conviction/(sum of convictions)). This should be easy to do using a list comprehension.

    0 讨论(0)
  • 2020-12-02 08:50

    Here's a better answer for a special probability distribution, the one Rex Logan's answer seems to be geared at. The distribution is like this: each object has an integer weight between 0 and 100, and its probability is in proportion to its weight. Since that's the currently accepted answer, I guess this is worth thinking about.

    So keep an array of 101 bins. Each bin holds a list of all of the objects with its particular weight. Each bin also knows the total weight of all its objects.

    To sample: pick a bin at random in proportion to its total weight. (Use one of the standard recipes for this -- linear or binary search.) Then pick an object from the bin uniformly at random.

    To transfer an object: remove it from its bin, put it in its bin in the target, and update both bins' weights. (If you're using binary search for sampling, you must also update the running sums that uses. This is still reasonably fast since there aren't many bins.)

    0 讨论(0)
  • 2020-12-02 08:50

    (A year later) Walker's alias method for random objects with different probablities is very fast and very simple

    0 讨论(0)
  • 2020-12-02 08:51

    The simplest thing to do is to use random.choice (which uses a uniform distribution) and vary the frequency of occurrence on the object in the source collection.

    >>> random.choice([1, 2, 3, 4])
    4
    

    ... vs:

    >>> random.choice([1, 1, 1, 1, 2, 2, 2, 3, 3, 4])
    2
    

    So your objects could have a base occurrence rate (n) and between 1 and n objects are added to the source collection as a function of the conviction rate. This method is really simple; however, it can have significant overhead if the number of distinct objects is large or the conviction rate needs to be very fine grained.

    Alternatively, if you generate more that one random number using a uniform distribution and sum them, numbers occurring near the mean are more probable that those occurring near the extremes (think of rolling two dice and the probability of getting 7 versus 12 or 2). You can then order the objects by conviction rate and generate a number using multiple die rolls which you use to calculate and index into the objects. Use numbers near the mean to index low conviction objects and numbers near the extremes to index high conviction items. You can vary the precise probability that a given object will be selected by changing the "number of sides" and number of your "dice" (it may be simpler to put the objects into buckets and use dice with a small number of sides rather than trying to associate each object with a specific result):

    >>> die = lambda sides : random.randint(1, sides)
    >>> die(6)
    3
    >>> die(6) + die(6) + die(6)
    10
    
    0 讨论(0)
  • 2020-12-02 08:55

    About 3 years later...

    If you use numpy, perhaps the simplest option is to use np.random.choice, which takes a list of possible values, and an optional sequence of probabilities associated with each value:

    import numpy as np
    
    values = ('A', 'B', 'C', 'D')
    weights = (0.5, 0.1, 0.2, 0.2)
    
    print ''.join(np.random.choice(values, size=60, replace=True, p=weights))
    # ACCADAACCDACDBACCADCAAAAAAADACCDCAADDDADAAACCAAACBAAADCADABA
    
    0 讨论(0)
提交回复
热议问题