Probability distribution in Python

后端未结

关注

 12  2015

I have a bunch of keys that each have an unlikeliness variable. I want to randomly choose one of these keys, yet I want it to be more unlikely for unlikely (key, values) to

相关标签:

12条回答

梦毁少年i

2020-12-02 08:43

I was needed in faster functions, for non very large numbers. So here it is, in Visual C++:

#undef _DEBUG // disable linking with python25_d.dll
#include <Python.h>
#include <malloc.h>
#include <stdlib.h>

static PyObject* dieroll(PyObject *, PyObject *args)
{
    PyObject *list;
    if (!PyArg_ParseTuple(args, "O:decompress", &list))
        return NULL;

    if (!PyList_Check(list)) 
        return PyErr_Format(PyExc_TypeError, "list of numbers expected ('%s' given)", list->ob_type->tp_name), NULL;

    int size = PyList_Size(list);

    if (size < 1)
        return PyErr_Format(PyExc_TypeError, "got empty list"), NULL;

    long *array = (long*)alloca(size*sizeof(long));

    long sum = 0;
    for (int i = 0; i < size; i++) {
        PyObject *o = PyList_GetItem(list, i);

        if (!PyInt_Check(o))
            return PyErr_Format(PyExc_TypeError, "list of ints expected ('%s' found)", o->ob_type->tp_name), NULL;
        long n = PyInt_AsLong(o);
        if (n == -1 && PyErr_Occurred())
            return NULL;
        if (n < 0)
            return PyErr_Format(PyExc_TypeError, "list of positive ints expected (negative found)"), NULL;

        sum += n; //NOTE: integer overflow
        array[i] = sum;
    }

    if (sum <= 0)
        return PyErr_Format(PyExc_TypeError, "sum of numbers is not positive"), NULL;

    int r = rand() * (sum-1) / RAND_MAX; //NOTE: rand() may be too small (0x7fff).    rand() * sum may result in integer overlow.

    assert(array[size-1] == sum);
    assert(r < sum && r < array[size-1]);
    for (int i = 0; i < size; ++i)
    {
        if (r < array[i])
            return PyInt_FromLong(i);
    }
    return PyErr_Format(PyExc_TypeError, "internal error."), NULL;
}

static PyMethodDef module_methods[] = 
{
    {"dieroll", (PyCFunction)dieroll, METH_VARARGS, "random index, beased on weights" },
    {NULL}  /* Sentinel */
};

PyMODINIT_FUNC initdieroll(void) 
{
    PyObject *module = Py_InitModule3("dieroll", module_methods, "dieroll");
    if (module == NULL)
        return;
}

0 讨论(0)

南笙

2020-12-02 08:47

I would use this recipe . You will need to add a weight to your objects, but that is just a simple ratio and put them in a list of tuples (object, conviction/(sum of convictions)). This should be easy to do using a list comprehension.

0 讨论(0)
发布评论:

提交评论
- 加载中...
误落风尘

2020-12-02 08:50

Here's a better answer for a special probability distribution, the one Rex Logan's answer seems to be geared at. The distribution is like this: each object has an integer weight between 0 and 100, and its probability is in proportion to its weight. Since that's the currently accepted answer, I guess this is worth thinking about.

So keep an array of 101 bins. Each bin holds a list of all of the objects with its particular weight. Each bin also knows the total weight of all its objects.

To sample: pick a bin at random in proportion to its total weight. (Use one of the standard recipes for this -- linear or binary search.) Then pick an object from the bin uniformly at random.

To transfer an object: remove it from its bin, put it in its bin in the target, and update both bins' weights. (If you're using binary search for sampling, you must also update the running sums that uses. This is still reasonably fast since there aren't many bins.)

0 讨论(0)
发布评论:

提交评论
- 加载中...
悲&欢浪女

2020-12-02 08:50

(A year later) Walker's alias method for random objects with different probablities is very fast and very simple

0 讨论(0)
发布评论:

提交评论
- 加载中...
臣服心动

2020-12-02 08:51
The simplest thing to do is to use random.choice (which uses a uniform distribution) and vary the frequency of occurrence on the object in the source collection.
```
>>> random.choice([1, 2, 3, 4])
4
```
... vs:
```
>>> random.choice([1, 1, 1, 1, 2, 2, 2, 3, 3, 4])
2
```
So your objects could have a base occurrence rate (n) and between 1 and n objects are added to the source collection as a function of the conviction rate. This method is really simple; however, it can have significant overhead if the number of distinct objects is large or the conviction rate needs to be very fine grained.

Alternatively, if you generate more that one random number using a uniform distribution and sum them, numbers occurring near the mean are more probable that those occurring near the extremes (think of rolling two dice and the probability of getting 7 versus 12 or 2). You can then order the objects by conviction rate and generate a number using multiple die rolls which you use to calculate and index into the objects. Use numbers near the mean to index low conviction objects and numbers near the extremes to index high conviction items. You can vary the precise probability that a given object will be selected by changing the "number of sides" and number of your "dice" (it may be simpler to put the objects into buckets and use dice with a small number of sides rather than trying to associate each object with a specific result):
```
>>> die = lambda sides : random.randint(1, sides)
>>> die(6)
3
>>> die(6) + die(6) + die(6)
10
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2020-12-02 08:55
About 3 years later...

If you use numpy, perhaps the simplest option is to use np.random.choice, which takes a list of possible values, and an optional sequence of probabilities associated with each value:
```
import numpy as np

values = ('A', 'B', 'C', 'D')
weights = (0.5, 0.1, 0.2, 0.2)

print ''.join(np.random.choice(values, size=60, replace=True, p=weights))
# ACCADAACCDACDBACCADCAAAAAAADACCDCAADDDADAAACCAAACBAAADCADABA
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2