Multikey Multivalue Non Deterministic python dictionary

前端未结

关注

 4  1993

清酒与你 2021-02-05 23:08

There is already a multi key dict in python and also a multivalued dict. I needed a python dictionary which is both:

example:

# probabilistically fetch a


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   情书的邮戳
                                             
                
                
                (楼主)
            
              
              
                2021-02-05 23:27
              

            
            
                        
Simulated MultiKey Dictionary

multi_key_dict did not allow __getitem__() with multiple keys at onces... 

(e.g. d["red", "green"]) 

A multi key can be simulated with tuple or set keys. If order does not matter, set seems the best (actually the hashable frozen set, so that ["red", "blue"] is the same a ["blue", "red"].

Simulated MultiVal Dictionary

Multi values are inherent by using certain datatypes, it can be any storage element that may be conveniently indexed.  A standard dict should provide that.

Non-determinism

Using a probability distribution defined by the rules and assumptions¹, non-deterministic selection is performed using this recipe from the python docs.

MultiKeyMultiValNonDeterministicDict Class

^{What a name.   \o/^{^-nice!}}

This class takes multiple keys that define a probabilistic rule set of  multiple values.  During item creation (__setitem__()) all value probabilities are precomputed for all combinations of keys¹.  During item access (__getitem__()) the precomputed probability distribution is selected and the result is evaluated based on a random weighted selection.

Definition

import random
import operator
import bisect
import itertools

# or use itertools.accumulate in python 3
def accumulate(iterable, func=operator.add):
    'Return running totals'
    # accumulate([1,2,3,4,5]) --> 1 3 6 10 15
    # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
    it = iter(iterable)
    try:
        total = next(it)
    except StopIteration:
        return
    yield total
    for element in it:
        total = func(total, element)
        yield total

class MultiKeyMultiValNonDeterministicDict(dict):

    def key_combinations(self, keys):
        """get all combinations of keys"""
        return [frozenset(subset) for L in range(0, len(keys)+1) for subset in itertools.combinations(keys, L)]

    def multi_val_rule_prob(self, rules, rule):
        """
        assign probabilities for each value, 
        spreading undefined result probabilities
        uniformly over the leftover results not defined by rule.
        """
        all_results = set([result for result_probs in rules.values() for result in result_probs])
        prob = rules[rule]
        leftover_prob = 1.0 - sum([x for x in prob.values()])
        leftover_results = len(all_results) - len(prob)
        for result in all_results:
            if result not in prob:
                # spread undefined prob uniformly over leftover results
                prob[result] = leftover_prob/leftover_results
        return prob

    def multi_key_rule_prob(self, key, val):
        """
        assign probability distributions for every combination of keys,
        using the default for combinations not defined in rule set
        """ 
        combo_probs = {}
        for combo in self.key_combinations(key):
            if combo in val:
                result_probs = self.multi_val_rule_prob(val, combo).items()
            else:
                result_probs = self.multi_val_rule_prob(val, frozenset([])).items()
            combo_probs[combo] = result_probs
        return combo_probs

    def weighted_random_choice(self, weighted_choices):
        """make choice from weighted distribution"""
        choices, weights = zip(*weighted_choices)
        cumdist = list(accumulate(weights))
        return choices[bisect.bisect(cumdist, random.random() * cumdist[-1])]

    def __setitem__(self, key, val):
        """
        set item in dictionary, 
        assigns values to keys with precomputed probability distributions
        """

        precompute_val_probs = self.multi_key_rule_prob(key, val)        
        # use to show ALL precomputed probabilities for key's rule set
        # print precompute_val_probs        

        dict.__setitem__(self, frozenset(key), precompute_val_probs)

    def __getitem__(self, key):
        """
        get item from dictionary, 
        randomly select value based on rule probability
        """
        key = frozenset([key]) if isinstance(key, str) else frozenset(key)             
        val = None
        weighted_val = None        
        if key in self.keys():
            val = dict.__getitem__(self, key)
            weighted_val = val[key]
        else:
            for k in self.keys():
                if key.issubset(k):
                    val = dict.__getitem__(self, k)
                    weighted_val = val[key]

        # used to show probabality for key
        # print weighted_val

        if weighted_val:
            prob_results = self.weighted_random_choice(weighted_val)
        else:
            prob_results = None
        return prob_results


Usage

d = MultiKeyMultiValNonDeterministicDict()

d["red","blue","green"] = {
    # {rule_set} : {result: probability}
    frozenset(["red", "green"]): {"ballon": 0.8},
    frozenset(["blue"]): {"toy": 0.15},
    frozenset([]): {"car": 0.05}
}


Testing

Check the probabilities

N = 10000
red_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
default_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}

for _ in xrange(N):
    red_green_test[d["red","green"]] += 1.0
    red_blue_test[d["red","blue"]] += 1.0
    blue_test[d["blue"]] += 1.0
    default_test[d["green"]] += 1.0
    red_blue_green_test[d["red","blue","green"]] += 1.0

print 'red,green test      =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_green_test.items())
print 'red,blue test       =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_test.items())
print 'blue test           =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in blue_test.items())
print 'default test        =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in default_test.items())
print 'red,blue,green test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_green_test.items())




red,green test      = car: 09.89% toy: 10.06% ballon: 80.05%
red,blue test       = car: 05.30% toy: 47.71% ballon: 46.99%
blue test           = car: 41.69% toy: 15.02% ballon: 43.29%
default test        = car: 05.03% toy: 47.16% ballon: 47.81%
red,blue,green test = car: 04.85% toy: 49.20% ballon: 45.95%


Probabilities match rules!



Footnotes


Distribution Assumption

Since the rule set is not fully defined, assumptions are made about the probability distributions, most of this is done in multi_val_rule_prob().  Basically any undefined probability will be spread uniformly over the remaining values. This is done for all combinations of keys, and creates a generalized key interface for the random weighted selection.

Given the example rule set

d["red","blue","green"] = {
    # {rule_set} : {result: probability}
    frozenset(["red", "green"]): {"ballon": 0.8},
    frozenset(["blue"]): {"toy": 0.15},
    frozenset([]): {"car": 0.05}
}


this will create the following distributions

'red'           = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'green'         = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'blue'          = [('car', 0.425), ('toy', 0.150), ('ballon', 0.425)]
'blue,red'      = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'green,red'     = [('car', 0.098), ('toy', 0.098), ('ballon', 0.800)]
'blue,green'    = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'blue,green,red'= [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
 default        = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]


If this is incorrect, please advise.

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复