Python: How to update value of key value pair in nested dictionary?

淺唱寂寞╮ 提交于 2019-12-02 02:24:43

One could use Python's collections.defaultdict instead of creating an AutoVivification class and then instantiating dictionary as an object of that type.

import collections
dictionary = collections.defaultdict(lambda: collections.defaultdict(int))

This will create a dictionary of dictionaries with a default value of 0. When you wish to increment an entry, use:

dictionary[keyword][filename] += 1

I agree you should avoid the extra classes, and especially __getitem__. (Small conceptual errors can make __getitem__ or __getattr__ quite painful to debug.)

Python dict seems quite strong enough for what you are doing.

What about straightforward dict.setdefault

    for keyword in uniques:                             #For every unique word do   
        for word in text:                               #for every word in doc:
            if (word == keyword):
                dictionary.setdefault(keyword, {})
                dictionary[keyword].setdefault(filename, 0)
                dictionary[keyword][filename] += 1

Of course this would be where dictionary is just a dict, and not something from collections or a custom class of your own.

Then again, isn't this just:

        for word in text:                               #for every word in doc:
            dictionary.setdefault(word, {})
            dictionary[word].setdefault(filename, 0)
            dictionary[word][filename] += 1

No reason to isolate unique instances, since the dict forces unique keys anyway.

if (word == keyword and dictionary[keyword][filename] is not None): 

that is not a correct usage i guess, instead try this:

if (word == keyword and filename in dictionary[keyword]): 

Because, checking the value of a non-existing key raise KeyError. :so You must check if key exists in dictionary...

I think you are trying to add 1 to a dictionary entry that doesn't yet exist. Your getitem method is for some reason returning a new instance of the AutoVivification class when a lookup fails. You're therefore trying to add 1 to a new instance of the class.

I think the answer is to update the getitem method so that it sets the counter to 0 if it doesn't yet exist.

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            self[item] = 0
            return 0

Hope this helps.

Not sure why you need nested dicts here. In a typical index scenario you have a forward index mapping

document id -> [word_ids]

and an inverse index mapping

word_id -> [document_ids]

Not sure if this is related here but using two indexes you can perform all kind of queries very efficiently and the implementation is straight forward since you don't need to deal with nested data structures.

In the AutoVivification class, you define

value = self[item] = type(self)()
return value

which returns an instance of self, which is an AutoVivification in that context. The error becomes then clear.

Are you sure you want to return an AutoVivification on any missing key query? From the code, I would assume you want to return a normal dictionary with string key and int values.

By the way, maybe you would be interested in the defaultdict class.

This AutoVivification class is not the magic you are looking for.

Check out collections.defaultdict from the standard library. Your inner dicts should be defaultdicts that default to integer values, and your outer dicts would then be defaultdicts that default to inner-dict values.

It would be better to kick AutoVivification out all together, because it adds nothing.

The following line:

if (word == keyword and dictionary[keyword][filename] is not None):

Doesn't work as expected, because of the way your class works, dictionary[keyword] will always return an instance of AutoVivification, and so will dictionary[keyword][filename].

Hugh Bothwell
#!/usr/bin/env python
# encoding: utf-8
from os.path import join
from glob import glob as glob_
from collections import defaultdict, Counter
from string import punctuation

WORKDIR  = 'temp/'
FILETYPE = '*.html'
OUTF     = 'doc_{0}'.format

def extract(text, startTag='<pre>', endTag='</pre>'):
    """Extract text between start tag and end tag

    Start at first char following first occurrence of startTag
      If none, begin at start of text
    End at last char preceding first subsequent occurrence of endTag
      If none, end at end of text
    """
    return text.split(startTag, 1)[-1].split(endTag, 1)[0]    

def main():
    DocWords = defaultdict(dict)

    infnames = glob_(join(WORKDIR, FILETYPE))
    for docId,infname in enumerate(infnames, 1):
        outfname = OUTF(docId)
        with open(infname) as inf:
            text = inf.read().lower()
        words = extract(text).strip(punctuation).split()
        for wd,num in Counter(words).iteritems():
            DocWords[wd][outfname] = num

if __name__ == '__main__':
    main()
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!