How to choose keys from a python dictionary based on weighted probability?

问题

I have a Python dictionary where keys represent some item and values represent some (normalized) weighting for said item. For example:

d = {'a': 0.0625, 'c': 0.625, 'b': 0.3125}
# Note that sum([v for k,v in d.iteritems()]) == 1 for all `d`

Given this correlation of items to weights, how can I choose a key from d such that 6.25% of the time the result is 'a', 32.25% of the time the result is 'b', and 62.5% of the result is 'c'?

回答1:

def weighted_random_by_dct(dct):
    rand_val = random.random()
    total = 0
    for k, v in dct.items():
        total += v
        if rand_val <= total:
            return k
    assert False, 'unreachable'

Should do the trick. Goes through each key and keeps a running sum and if the random value (between 0 and 1) falls in the slot it returns that key

回答2:

If you're planning to do this a lot, you could use numpy to select your keys from a list with weighted probabilities using np.random.choice(). The below example will pick your keys 10,000 times with the weighted probabilities.

import numpy as np

probs = [0.0625, 0.625, 0.3125]
keys = ['a', 'c', 'b']

choice_list = np.random.choice(keys, 10000, replace=True, p=probs)

回答3:

Not sure what your use case is here, but you can check out the frequency distribution/probability distribution classes in the NLTK package, which handle all the nitty details.

FreqDist is an extension of a counter, which can be passed to a ProbDistI interface. The ProbDistI interface exposes a "generate()" method which can be used to sample the distribution, as well as a "prob(sample)" method that can be used to get the probability of a given key.

For your case you'd want to use Maximum Likelihood Estimation, so the MLEProbDist. If you want to smooth the distribution, you could try LaplaceProbDist or SimpleGoodTuringProbDist.

For example:

from nltk.probability import FreqDist, MLEProbDist

d = {'a': 6.25, 'c': 62.5, 'b': 31.25}
freq_dist = FreqDist(d)
prob_dist = MLEProbDist(freq_dist)

print prob_dist.prob('a')
print prob_dist.prob('b')
print prob_dist.prob('c')
print prob_dist.prob('d')

will print "0.0625 0.3125 0.625 0.0".

To generate a new sample, you can use:

prob_dist.generate()

回答4:

If you are able to use numpy, you can use the numpy.random.choice function, like so:

import numpy as np

d = {'a': 0.0625, 'c': 0.625, 'b': 0.3125}

def pick_by_weight(d):
    d_choices = []
    d_probs = []
    for k,v in d.iteritems():
      d_choices.append(k)
      d_probs.append(v)
    return np.random.choice(d_choices, 1, p=d_probs)[0]


d = {'a': 0.0625, 'c': 0.625, 'b': 0.3125}
choice = pick_by_weight(d)

回答5:

It may be useful to keep an "inverted" dictionary, where the keys are the weight values, and the values are lists of the keys you can get. That way it's easier to distribute it in case more keys have the same weight:

from collections import defaultdict
import random

dict = {'a': 0.0625, 'd': 0.0625, 'c': 0.625, 'b': 0.3125}

inverted_dict = defaultdict(list)

for k, v in dict.items():
    inverted_dict[v].append(k)

# Here first you get a random value between 0 and 1, which is your weigth
# Then, you choose a random value from the list of keys that have the same weight
print(random.choice(inverted_dict[random.choice(inverted_dict.keys())]))

回答6:

What i have understood: you need a simple random function that will generate a random number uniformly in between 0 and 1. If the value is in between say 0 to 0.0625, you will select key a, if it is in between 0.0625 and (0.0625 + 0.625), then you will select key c etc. This is what actually mentioned in this answer.

Since random numbers will be generated uniformly, it is expected that keys associated with larger weight will be selected more compared to others.

来源：https://stackoverflow.com/questions/40927221/how-to-choose-keys-from-a-python-dictionary-based-on-weighted-probability

标签

python

random

probability