I have a data structure which essentially amounts to a nested dictionary. Let\'s say it looks like this:
{\'new jersey\': {\'mercer county\': {\'plumbers\':
For easy iterating over your nested dictionary, why not just write a simple generator?
def each_job(my_dict):
for state, a in my_dict.items():
for county, b in a.items():
for job, value in b.items():
yield {
'state' : state,
'county' : county,
'job' : job,
'value' : value
}
So then, if you have your compilicated nested dictionary, iterating over it becomes simple:
for r in each_job(my_dict):
print "There are %d %s in %s, %s" % (r['value'], r['job'], r['county'], r['state'])
Obviously your generator can yield whatever format of data is useful to you.
Why are you using try catch blocks to read the tree? It's easy enough (and probably safer) to query whether a key exists in a dict before trying to retrieve it. A function using guard clauses might look like this:
if not my_dict.has_key('new jersey'):
return False
nj_dict = my_dict['new jersey']
...
Or, a perhaps somewhat verbose method, is to use the get method:
value = my_dict.get('new jersey', {}).get('middlesex county', {}).get('salesmen', 0)
But for a somewhat more succinct way, you might want to look at using a collections.defaultdict, which is part of the standard library since python 2.5.
import collections
def state_struct(): return collections.defaultdict(county_struct)
def county_struct(): return collections.defaultdict(job_struct)
def job_struct(): return 0
my_dict = collections.defaultdict(state_struct)
print my_dict['new jersey']['middlesex county']['salesmen']
I'm making assumptions about the meaning of your data structure here, but it should be easy to adjust for what you actually want to do.
Unless your dataset is going to stay pretty small, you might want to consider using a relational database. It will do exactly what you want: make it easy to add counts, selecting subsets of counts, and even aggregate counts by state, county, occupation, or any combination of these.
You can use recursion in lambdas and defaultdict, no need to define names:
a = defaultdict((lambda f: f(f))(lambda g: lambda:defaultdict(g(g))))
Here's an example:
>>> a['new jersey']['mercer county']['plumbers']=3
>>> a['new jersey']['middlesex county']['programmers']=81
>>> a['new jersey']['mercer county']['programmers']=81
>>> a['new jersey']['middlesex county']['salesmen']=62
>>> a
defaultdict(<function __main__.<lambda>>,
{'new jersey': defaultdict(<function __main__.<lambda>>,
{'mercer county': defaultdict(<function __main__.<lambda>>,
{'plumbers': 3, 'programmers': 81}),
'middlesex county': defaultdict(<function __main__.<lambda>>,
{'programmers': 81, 'salesmen': 62})})})
For the following (copied from above) is there a way to implement the append function. I am trying to use a nested dictionary to store values as array.
class Vividict(dict):
def __missing__(self, key):
value = self[key] = type(self)() # retain local pointer to value
return value
My current implementation is as follows:
totalGeneHash=Vividict()
for keys in GenHash:
for second in GenHash[keys]:
if keys in sampleHash:
total_val = GenHash[keys][second]
totalGeneHash[gene][keys].append(total_val)
This is the error I get: AttributeError: 'Vividict' object has no attribute 'append'
collections.defaultdict
can be sub-classed to make a nested dict. Then add any useful iteration methods to that class.
>>> from collections import defaultdict
>>> class nesteddict(defaultdict):
def __init__(self):
defaultdict.__init__(self, nesteddict)
def walk(self):
for key, value in self.iteritems():
if isinstance(value, nesteddict):
for tup in value.walk():
yield (key,) + tup
else:
yield key, value
>>> nd = nesteddict()
>>> nd['new jersey']['mercer county']['plumbers'] = 3
>>> nd['new jersey']['mercer county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['programmers'] = 81
>>> nd['new jersey']['middlesex county']['salesmen'] = 62
>>> nd['new york']['queens county']['plumbers'] = 9
>>> nd['new york']['queens county']['salesmen'] = 36
>>> for tup in nd.walk():
print tup
('new jersey', 'mercer county', 'programmers', 81)
('new jersey', 'mercer county', 'plumbers', 3)
('new jersey', 'middlesex county', 'programmers', 81)
('new jersey', 'middlesex county', 'salesmen', 62)
('new york', 'queens county', 'salesmen', 36)
('new york', 'queens county', 'plumbers', 9)
This is a function that returns a nested dictionary of arbitrary depth:
from collections import defaultdict
def make_dict():
return defaultdict(make_dict)
Use it like this:
d=defaultdict(make_dict)
d["food"]["meat"]="beef"
d["food"]["veggie"]="corn"
d["food"]["sweets"]="ice cream"
d["animal"]["pet"]["dog"]="collie"
d["animal"]["pet"]["cat"]="tabby"
d["animal"]["farm animal"]="chicken"
Iterate through everything with something like this:
def iter_all(d,depth=1):
for k,v in d.iteritems():
print "-"*depth,k
if type(v) is defaultdict:
iter_all(v,depth+1)
else:
print "-"*(depth+1),v
iter_all(d)
This prints out:
- food
-- sweets
--- ice cream
-- meat
--- beef
-- veggie
--- corn
- animal
-- pet
--- dog
---- labrador
--- cat
---- tabby
-- farm animal
--- chicken
You might eventually want to make it so that new items can not be added to the dict. It's easy to recursively convert all these defaultdict
s to normal dict
s.
def dictify(d):
for k,v in d.iteritems():
if isinstance(v,defaultdict):
d[k] = dictify(v)
return dict(d)