Python: How to update value of key value pair in nested dictionary?

后端 未结 9 984
广开言路
广开言路 2021-01-21 20:25

i am trying to make an inversed document index, therefore i need to know from all unique words in a collection in which doc they occur and how often.

i have used this an

相关标签:
9条回答
  • 2021-01-21 21:06

    I agree you should avoid the extra classes, and especially __getitem__. (Small conceptual errors can make __getitem__ or __getattr__ quite painful to debug.)

    Python dict seems quite strong enough for what you are doing.

    What about straightforward dict.setdefault

        for keyword in uniques:                             #For every unique word do   
            for word in text:                               #for every word in doc:
                if (word == keyword):
                    dictionary.setdefault(keyword, {})
                    dictionary[keyword].setdefault(filename, 0)
                    dictionary[keyword][filename] += 1
    

    Of course this would be where dictionary is just a dict, and not something from collections or a custom class of your own.

    Then again, isn't this just:

            for word in text:                               #for every word in doc:
                dictionary.setdefault(word, {})
                dictionary[word].setdefault(filename, 0)
                dictionary[word][filename] += 1
    

    No reason to isolate unique instances, since the dict forces unique keys anyway.

    0 讨论(0)
  • 2021-01-21 21:07

    One could use Python's collections.defaultdict instead of creating an AutoVivification class and then instantiating dictionary as an object of that type.

    import collections
    dictionary = collections.defaultdict(lambda: collections.defaultdict(int))
    

    This will create a dictionary of dictionaries with a default value of 0. When you wish to increment an entry, use:

    dictionary[keyword][filename] += 1
    
    0 讨论(0)
  • 2021-01-21 21:09

    It would be better to kick AutoVivification out all together, because it adds nothing.

    The following line:

    if (word == keyword and dictionary[keyword][filename] is not None):
    

    Doesn't work as expected, because of the way your class works, dictionary[keyword] will always return an instance of AutoVivification, and so will dictionary[keyword][filename].

    0 讨论(0)
  • 2021-01-21 21:11
    if (word == keyword and dictionary[keyword][filename] is not None): 
    

    that is not a correct usage i guess, instead try this:

    if (word == keyword and filename in dictionary[keyword]): 
    

    Because, checking the value of a non-existing key raise KeyError. :so You must check if key exists in dictionary...

    0 讨论(0)
  • 2021-01-21 21:14

    Not sure why you need nested dicts here. In a typical index scenario you have a forward index mapping

    document id -> [word_ids]

    and an inverse index mapping

    word_id -> [document_ids]

    Not sure if this is related here but using two indexes you can perform all kind of queries very efficiently and the implementation is straight forward since you don't need to deal with nested data structures.

    0 讨论(0)
  • 2021-01-21 21:15

    This AutoVivification class is not the magic you are looking for.

    Check out collections.defaultdict from the standard library. Your inner dicts should be defaultdicts that default to integer values, and your outer dicts would then be defaultdicts that default to inner-dict values.

    0 讨论(0)
提交回复
热议问题