问题

There is already a multi key dict in python and also a multivalued dict. I needed a python dictionary which is both:

example:

# probabilistically fetch any one of baloon, toy or car
d['red','blue','green']== "baloon" or "car" or "toy"

Probability of d['red']==d['green'] is high and Probability of d['red']!=d['red'] is low but possible

the single output value should be probabilistically determined (fuzzy) based on a rule from keys eg:in above case rule could be if keys have both "red" and "blue" then return "baloon" 80% of time if only blue then return "toy" 15% of time else "car" 5% of time.

The setitem method should be designed such that following is possible:

d["red", "blue"] =[
    ("baloon",haseither('red','green'),0.8),
    ("toy",.....)
    ,....
]

Above assigns multiple values to the dictionary with a predicate function and corresponding probability. And instead of the assignment list above even a dictionary as assignment would be preferable:

d["red", "blue"] ={ 
    "baloon": haseither('red','green',0.8),
    "toy": hasonly("blue",0.15),
    "car": default(0.05)
}

In the above baloon will be returned 80% of time if "red" or green is present , return toy 15% of time if blue present and return car 5% of time without any condition.

Are there any existing data structures which already satisfy the above requirements in python? if no then how can multikeydict code be modified to meet the above requirements in python?

if using dictionary then there can be a configuration file or use of appropriate nested decorators which configures the above probabilistic predicate logics without having to hard code if \else statements .

Note: Above is a useful automata for a rule based auto responder application hence do let me know if any similar rule based framework is available in python even if it does not use the dictionary structure?

回答1:

Simulated MultiKey Dictionary

multi_key_dict did not allow __getitem__() with multiple keys at onces...

(e.g. d["red", "green"])

A multi key can be simulated with tuple or set keys. If order does not matter, set seems the best (actually the hashable frozen set, so that ["red", "blue"] is the same a ["blue", "red"].

Simulated MultiVal Dictionary

Multi values are inherent by using certain datatypes, it can be any storage element that may be conveniently indexed. A standard dict should provide that.

Non-determinism

Using a probability distribution defined by the rules and assumptions¹, non-deterministic selection is performed using this recipe from the python docs.

`MultiKeyMultiValNonDeterministicDict` Class

^{What a name. \o/^{^-nice!}}

This class takes multiple keys that define a probabilistic rule set of multiple values. During item creation (__setitem__()) all value probabilities are precomputed for all combinations of keys¹. During item access (__getitem__()) the precomputed probability distribution is selected and the result is evaluated based on a random weighted selection.

Definition

import random
import operator
import bisect
import itertools

# or use itertools.accumulate in python 3
def accumulate(iterable, func=operator.add):
    'Return running totals'
    # accumulate([1,2,3,4,5]) --> 1 3 6 10 15
    # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
    it = iter(iterable)
    try:
        total = next(it)
    except StopIteration:
        return
    yield total
    for element in it:
        total = func(total, element)
        yield total

class MultiKeyMultiValNonDeterministicDict(dict):

    def key_combinations(self, keys):
        """get all combinations of keys"""
        return [frozenset(subset) for L in range(0, len(keys)+1) for subset in itertools.combinations(keys, L)]

    def multi_val_rule_prob(self, rules, rule):
        """
        assign probabilities for each value, 
        spreading undefined result probabilities
        uniformly over the leftover results not defined by rule.
        """
        all_results = set([result for result_probs in rules.values() for result in result_probs])
        prob = rules[rule]
        leftover_prob = 1.0 - sum([x for x in prob.values()])
        leftover_results = len(all_results) - len(prob)
        for result in all_results:
            if result not in prob:
                # spread undefined prob uniformly over leftover results
                prob[result] = leftover_prob/leftover_results
        return prob

    def multi_key_rule_prob(self, key, val):
        """
        assign probability distributions for every combination of keys,
        using the default for combinations not defined in rule set
        """ 
        combo_probs = {}
        for combo in self.key_combinations(key):
            if combo in val:
                result_probs = self.multi_val_rule_prob(val, combo).items()
            else:
                result_probs = self.multi_val_rule_prob(val, frozenset([])).items()
            combo_probs[combo] = result_probs
        return combo_probs

    def weighted_random_choice(self, weighted_choices):
        """make choice from weighted distribution"""
        choices, weights = zip(*weighted_choices)
        cumdist = list(accumulate(weights))
        return choices[bisect.bisect(cumdist, random.random() * cumdist[-1])]

    def __setitem__(self, key, val):
        """
        set item in dictionary, 
        assigns values to keys with precomputed probability distributions
        """

        precompute_val_probs = self.multi_key_rule_prob(key, val)        
        # use to show ALL precomputed probabilities for key's rule set
        # print precompute_val_probs        

        dict.__setitem__(self, frozenset(key), precompute_val_probs)

    def __getitem__(self, key):
        """
        get item from dictionary, 
        randomly select value based on rule probability
        """
        key = frozenset([key]) if isinstance(key, str) else frozenset(key)             
        val = None
        weighted_val = None        
        if key in self.keys():
            val = dict.__getitem__(self, key)
            weighted_val = val[key]
        else:
            for k in self.keys():
                if key.issubset(k):
                    val = dict.__getitem__(self, k)
                    weighted_val = val[key]

        # used to show probabality for key
        # print weighted_val

        if weighted_val:
            prob_results = self.weighted_random_choice(weighted_val)
        else:
            prob_results = None
        return prob_results

Usage

d = MultiKeyMultiValNonDeterministicDict()

d["red","blue","green"] = {
    # {rule_set} : {result: probability}
    frozenset(["red", "green"]): {"ballon": 0.8},
    frozenset(["blue"]): {"toy": 0.15},
    frozenset([]): {"car": 0.05}
}

Testing

Check the probabilities

N = 10000
red_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
default_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}

for _ in xrange(N):
    red_green_test[d["red","green"]] += 1.0
    red_blue_test[d["red","blue"]] += 1.0
    blue_test[d["blue"]] += 1.0
    default_test[d["green"]] += 1.0
    red_blue_green_test[d["red","blue","green"]] += 1.0

print 'red,green test      =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_green_test.items())
print 'red,blue test       =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_test.items())
print 'blue test           =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in blue_test.items())
print 'default test        =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in default_test.items())
print 'red,blue,green test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_green_test.items())

red,green test      = car: 09.89% toy: 10.06% ballon: 80.05%
red,blue test       = car: 05.30% toy: 47.71% ballon: 46.99%
blue test           = car: 41.69% toy: 15.02% ballon: 43.29%
default test        = car: 05.03% toy: 47.16% ballon: 47.81%
red,blue,green test = car: 04.85% toy: 49.20% ballon: 45.95%

Probabilities match rules!

Footnotes

Distribution Assumption

Since the rule set is not fully defined, assumptions are made about the probability distributions, most of this is done in multi_val_rule_prob(). Basically any undefined probability will be spread uniformly over the remaining values. This is done for all combinations of keys, and creates a generalized key interface for the random weighted selection.

Given the example rule set

d["red","blue","green"] = {
    # {rule_set} : {result: probability}
    frozenset(["red", "green"]): {"ballon": 0.8},
    frozenset(["blue"]): {"toy": 0.15},
    frozenset([]): {"car": 0.05}
}

this will create the following distributions

'red'           = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'green'         = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'blue'          = [('car', 0.425), ('toy', 0.150), ('ballon', 0.425)]
'blue,red'      = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'green,red'     = [('car', 0.098), ('toy', 0.098), ('ballon', 0.800)]
'blue,green'    = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'blue,green,red'= [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
 default        = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]

If this is incorrect, please advise.

回答2:

the single output value should be probabilistically determined (fuzzy) based on a rule from keys eg:in above case rule could be if keys have both "red" and "blue" then return "baloon" 80% of time if only blue then return "toy" 15% of time else "car" 5% of time.

Bare in mind your case analysis is not complete, and it's ambiguous, but you can do the following "in spirit" (fleshing out the desired results):

import random

def randomly_return(*colors):
    colors = set(*colors)
    if 'red' in colors and 'blue' in colors:
        if random.random() < 0.8:  # 80 % of the time
            return "baloon"

    if 'blue' in colors and len(colors) == 1:  # only blue in colors
        if random.random() < 0.15:
            return "toy"
        else:
            if random.random() < 0.05:
                return "car"

# other cases to consider

I would keep this as a function, because it is a function! But if you insist to make it dict-like, then python let's you do this by overriding __getitem__ (IMO it's not pythonic).

class RandomlyReturn(object):
    def __getitem__(self, *colors):
        return randomly_return(*colors)

>>> r = RandomlyReturn()
>>> r["red", "blue"]  # 80% of the time it'll return "baloon"
"baloon"

From your clarification, OP wants to pass and generate:

randreturn((haseither(red,blue),baloon:0.8),((hasonly(blue),toy:0.15)),(default(‌),car:0.05)))

you want to generate a function as follows:

funcs = {"haseither": lambda needles, haystack: any(n in haystack for n in needles),
         "hasonly": lambda needles, haystack: len(needles) == 1 and needles[1] in haystack}

def make_random_return(crits, default):
    def random_return(*colors):
        colors = set(*colors)
        for c in crits:
            if funcs[c["func"]](c["args"], colors) and random.random() > c["with_prob"]:
                return c["return_value"]
        return default
    return random_return

where the crit and default in this case would be:

crit = [{"func": "haseither", "args": ("red", "blue"), "return_value": "baloon", "with_prob": 0.8}, ...]
default = "car"  # ??
my_random_return = make_random_return(crits, default)

As I say, your probabilities are ambiguous/don't add up, so you're most likely going to need to tweak this...

You can extend the class definition by passing crit and default upon instantiation:

class RandomlyReturn(object):
    def __init__(self, crit, default):
        self.randomly_return = make_random_return(crit, default)
    def __getitem__(self, *colors):
        return self.randomly_return(*colors)

>>> r = RandomlyReturn(crit, default)
>>> r["red", "blue"]  # 80% of the time it'll return "baloon"
"baloon"

回答3:

If it is possible to change the data structure, it would be simpler to have a function returning the data you need. This will be completely flexible and could accommodate any kind of data, should you need to change them later.

import random

def myfunc(*args):
    if 'red' in args:
        return 'blue'
    elif 'green' in args or 'violet' in args:
        return 'violet'
    else:
        r = random.random()
        if 0 < r < 0.2:
            return 'blue'
        else:
            return 'green'

print(myfunc('green', 'blue'))
print(myfunc('yellow'))

output (the second line obviously changes):

violet
blue

回答4:

The OP wants as follows,

d["red", "blue"] ={ 
    "baloon": haseither('red','green',0.8),
    "toy": hasonly("blue",0.15),
    "car": default(0.05)
}

but this is data with embeded logic. It's very tedious to define a function for every value. What I suggest is to seprate the data and logic.

Python has a data type for this, that's class. A callable instance of a class can be assigned to the dict and let the dict pass the keys and call the object to return the result.

I've inherited and extended multiple_key_dict to support multi-key fetch and to pass keys to the object and call the object which has been stored in the dict.

I assume data is recalculated per rule. This is Rule class, it has list of rules. A rule is a Python expressions and it has access to len function and keys list. So one can write a rule like len(keys) == 1 and 'blue' in keys.

class Rule(object):

    def __init__(self, rule, data):
        self.rule = rule
        self.data = data

This is Data class which has both set of data and rules.

class Data(object):
    def __init__(self, rules):
        self.rules= rules

    def make_choice(self, data):
        data = tuple(self.make_list_of_values(data))
        return random.choice(data)

    def make_list_of_values(self, data):
        for val, weight in data:
            percent = int(weight * 100)
            for v in [val] * percent:
                yield v

    def __call__(self, keys):
        for rule in self.rules:
            if eval(rule.rule,dict(keys=keys)):
                return self.make_choice(rule.data)

This is RuleDict, but non-callables can not be fetched.

class RuleDict(multi_key_dict):
    def __init__(self, *args, **kwargs):
        multi_key_dict.__init__(self, *args, **kwargs)

    def __getitem__(self, keys):
        if isinstance(keys, str):
            keys = (keys, )
        keys_set = frozenset(keys)
        for key in self.keys():
            key = frozenset(key)
            if keys_set <= key:
                return multi_key_dict.__getitem__(self,keys[0])(keys)
        raise KeyError(keys)

usage example,

d = RuleDict()
rule1 = Rule('"red" in keys and "green" in keys',(('baloon',0.8), ('car',0.05), ('toy',0.15)))
rule2 = Rule('len(keys) ==1 and "blue" in keys',(('baloon',0.25), ('car',0.35), ('toy',0.15)))
data = Data((rule1, rule2))
d['red','blue','green'] = data

print(d['red','green'])

d['red','green'] calls the object, with keys, that was assigned and return the result.

Another approach is, to make the dict callable. This one seems a sound approach, because data and logic are separate. By this, you pass the keys and the logic, a callable, to the dict and return the result. f.e.,

def f(keys, data):
    pass # do the logic and return data

d['red','blue','green'] = ('baloon', 'car', 'toy')

Now call the dict

d(('red','blue'),f)

This is callable dict. If no callable is given, just returns the whole data.

class callable_mkd(multi_key_dict):
    def __init__(self, *args, **kwargs):
        multi_key_dict.__init__(self, *args, **kwargs)

    def __call__(self, keys, process=None):
        keys_set = frozenset(keys)
        for key in self.keys():
            key = frozenset(key)
            if keys_set <= key:
                if process:
                    return process(keys, self[keys[0]])
                return self[keys[0]]
        raise KeyError(keys)

来源：https://stackoverflow.com/questions/29938321/multikey-multivalue-non-deterministic-python-dictionary

标签

python