Python: How to update value of key value pair in nested dictionary?

后端未结

关注

 9  985

i am trying to make an inversed document index, therefore i need to know from all unique words in a collection in which doc they occur and how often.

i have used this an

相关标签:

9条回答

天命终不由人

2021-01-21 21:18
In the AutoVivification class, you define
```
value = self[item] = type(self)()
return value
```
which returns an instance of self, which is an AutoVivification in that context. The error becomes then clear.

Are you sure you want to return an AutoVivification on any missing key query? From the code, I would assume you want to return a normal dictionary with string key and int values.

By the way, maybe you would be interested in the defaultdict class.
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2021-01-21 21:27
I think you are trying to add 1 to a dictionary entry that doesn't yet exist. Your getitem method is for some reason returning a new instance of the AutoVivification class when a lookup fails. You're therefore trying to add 1 to a new instance of the class.

I think the answer is to update the getitem method so that it sets the counter to 0 if it doesn't yet exist.
```
class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            self[item] = 0
            return 0
```
Hope this helps.
0 讨论(0)
发布评论:

提交评论
- 加载中...

伪装坚强ぢ

2021-01-21 21:28

#!/usr/bin/env python
# encoding: utf-8
from os.path import join
from glob import glob as glob_
from collections import defaultdict, Counter
from string import punctuation

WORKDIR  = 'temp/'
FILETYPE = '*.html'
OUTF     = 'doc_{0}'.format

def extract(text, startTag='<pre>', endTag='</pre>'):
    """Extract text between start tag and end tag

    Start at first char following first occurrence of startTag
      If none, begin at start of text
    End at last char preceding first subsequent occurrence of endTag
      If none, end at end of text
    """
    return text.split(startTag, 1)[-1].split(endTag, 1)[0]    

def main():
    DocWords = defaultdict(dict)

    infnames = glob_(join(WORKDIR, FILETYPE))
    for docId,infname in enumerate(infnames, 1):
        outfname = OUTF(docId)
        with open(infname) as inf:
            text = inf.read().lower()
        words = extract(text).strip(punctuation).split()
        for wd,num in Counter(words).iteritems():
            DocWords[wd][outfname] = num

if __name__ == '__main__':
    main()

0 讨论(0)

上一页 1 2