The way I go about nested dictionary is this:
dicty = dict()
tmp = dict()
tmp[\"a\"] = 1
tmp[\"b\"] = 2
dicty[\"A\"] = tmp
dicty == {\"A\" : {\"a\" : 1, \"b
While this isn't the ideal way to do things, you're pretty close to making it work.
Your main problem is that you're reusing the same tmp
dictionary. After you insert it into dicty
under the first key, you then clear
it and start filling it with the new values. Replace tmp.clear()
with tmp = {}
to fix that, so you have a different dictionary for each key, instead of the same one for all keys.
Your second problem is that you're never storing the last tmp
value in the dictionary when you reach the end, so add another dicty[oldword] = tmp
after the for
loop.
Your third problem is that you're checking if oldword is not "":
. That may be true even if it's an empty string, because you're comparing identity, not equality. Just change that to if oldword:
. (This one, you'll usually get away with, because small strings are usually interned and will usually share identity… but you shouldn't count on that.)
If you fix both of those, you get this:
{'FrontPage': {'frontpage': '0.710145', 'troubleshooting': '0.971014'},
'proA': {'macbook': '0.666667', 'smart': '0.666667', 'ssd': '0.666667'}}
I'm not sure how to turn this into the format you claim to want, because that format isn't even a valid dictionary. But hopefully this gets you close.
There are two simpler ways to do it:
itertools.groupby
, then transform each group into a dict and insert it all in one step. This, like your existing code, requires that the input already be batched by values[0]
.defaultdict
or the setdefault
method will make this concise, but even if you don't know about those, it's pretty simple to write it out explicitly, and it'll still be less verbose than what you have now.The second version is already explained very nicely in Martijn Pieters's answer.
The first can be written like this:
def doubleDict(s):
with open(filename, "r") as f:
rows = (line.rstrip().split(" ") for line in f)
return {k: {values[1]: values[2] for values in g}
for k, g in itertools.groupby(rows, key=operator.itemgetter(0))}
Of course that doesn't print out the dict so far after every 25 rows, but that's easy to add by turning the comprehension into an explicit loop (and ideally using enumerate
instead of keeping an explicit row
counter).
Use a collections.defaultdict() object to auto-instantiate nested dictionaries:
from collections import defaultdict
def doubleDict(filename):
dicty = defaultdict(dict)
with open(filename, "r") as f:
for i, line in enumerate(f):
outer, inner, value = line.split()
dicty[outer][inner] = value
if i % 25 == 0:
print(dicty)
break #print(row)
return(dicty)
I used enumerate()
to generate the line count here; much simpler than keeping a separate counter going.
Even without a defaultdict
, you can let the outer dictionary keep the reference to the nested dictionary, and retrieve it again by using values[0]
; there is no need to keep the temp
reference around:
>>> dicty = {}
>>> dicty['A'] = {}
>>> dicty['A']['a'] = 1
>>> dicty['A']['b'] = 2
>>> dicty
{'A': {'a': 1, 'b': 1}}
All the defaultdict
then does is keep us from having to test if we already created that nested dictionary. Instead of:
if outer not in dicty:
dicty[outer] = {}
dicty[outer][inner] = value
we simply omit the if
test as defaultdict
will create a new dictionary for us if the key was not yet present.