The two lines you indicated are indeed rather confusing. I'll try to explain them as best I can, and suggest alternative implementations.
The first one computes values for alla
and allc
:
(alla,allc,) = (set(s) for s in zip(*animaldictionary.keys()))
This is nearly equivalent to the loops you've already done above to build your alla
and allc
lists. You can skip it completely if you want. However, lets unpack what it's doing, so you can actually understand it.
The innermost part is animaldictionary.keys()
. This returns an iterable object that contains all the keys of your dictionary. Since the keys in animaldictionary
are two-valued tuples, that's what you'll get from the iterable. It's actually not necessary to call keys
when dealing with a dictionary in most cases, since operations on the keys view are usually identical to doing the same operation on the dictionary directly.
Moving on, the keys gets wrapped up by a call to the zip
function using zip(*keys)
. There's two things happening here. First, the *
syntax unpacks the iterable from above into separate arguments. So if animaldictionary's keys were ("a1", "c1), ("a2", "c2"), ("a3", "c3")
this would call zip
with those three tuples as separate arguments. Now, what zip
does is turn several iterable arguments into a single iterable, yielding a tuple with the first value from each, then a tuple with the second value from each, and so on. So zip(("a1", "c1"), ("a2", "c2"), ("a3", "c3"))
would return a generator yielding ("a1", "a2", "a3")
followed by ("c1", "c2", "c3")
.
The next part is a generator expression that passes each value from the zip
expression into the set
constructor. This serves to eliminate any duplicates. set
instances can also be useful in other ways (e.g. finding intersections) but that's not needed here.
Finally, the two sets of a
and c
values get assigned to variables alla
and allc
. They replace the lists you already had with those names (and the same contents!).
You've already got an alternative to this, where you calculate alla
and allc
as lists. Using sets may be slightly more efficient, but it probably doesn't matter too much for small amounts of data. Another, more clear, way to do it would be:
alla = set()
allc = set()
for key in animaldict: # note, iterating over a dict yields the keys!
a, c = key # unpack the tuple key
alla.add(a)
allc.add(c)
The second line you were asking about does some averaging and combines the results into a giant string which it prints out. It is really bad programming style to cram so much into one line. And in fact, it does some needless stuff which makes it even more confusing. Here it is, with a couple of line breaks added to make it all fit on the screen at once.
print('\n'.join(['\t'.join((c,str(sum(animaldictionary.get(ac,0)
for a in alla for ac in ((a,c,),))//12)
)) for c in sorted(allc)]))
The innermost piece of this is for ac in ((a,c,),)
. This is silly, since it's a loop over a 1-element tuple. It's a way of renaming the tuple (a,c)
to ac
, but it is very confusing and unnecessary.
If we replace the one use of ac
with the tuple explicitly written out, the new innermost piece is animaldictionary.get((a,c),0)
. This is a special way of writing animaldictionary[(a, c)]
but without running the risk of causing a KeyError
to be raised if (a, c)
is not in the dictionary. Instead, the default value of 0
(passed in to get
) will be returned for non-existant keys.
That get
call is wrapped up in this: (getcall for a in alla)
. This is a generator expression that gets all the values from the dictionary with a given c
value in the key
(with a default of zero if the value is not present).
The next step is taking the average of the values in the previous generator expression: sum(genexp)//12
. This is pretty straightforward, though you should note that using //
for division always rounds down to the next integer. If you want a more precise floating point value, use just /
.
The next part is a call to '\t'.join
, with an argument that is a single (c, avg)
tuple. This is an awkward construction that could be more clearly written as c+"\t"+str(avg)
or "{}\t{}".format(c, avg)
. All of these result in a string containing the c
value, a tab character and the string form of the average calcualted above.
The next step is a list comprehension, [joinedstr for c in sorted(allc)]
(where joinedstr is the join
call in the previous step). Using a list comprehension here is a bit odd, since there's no need for a list (a generator expression would do just as well).
Finally, the list comprehension is joined with newlines and printed: print("\n".join(listcomp))
. This is straightforward.
Anyway, this whole mess can be rewritten in a much clearer way, by using a few variables and printing each line separately in a loop:
for c in sorted(allc):
total_values = sum(animaldictionary.get((a,c),0) for a in alla)
average = total_values // 12
print("{}\t{}".format(c, average))
To finish, I have some general suggestions.
First, your data structure may not be optimal for the uses you are making of you data. Rather than having animaldict
be a dictionary with (a,c)
keys, it might make more sense to have a nested structure, where you index each level separately. That is, animaldict[a][c]
. It might even make sense to have a second dictionaries containing the same values indexed in the reverse order (e.g. one is indexed [a][c]
while another is indexed [c][a]
). With this approach you might not need the alla
and allc
lists for iterating (you'd just loop over the contents of the main dictionary directly).
My second suggestion is about code style. Many of your variables are named poorly, either because their names don't have any meaning (e.g. c
) or where the names imply a meaning that is incorrect. The most glaring issue is your key
and value
variables, which in fact unpack two pieces of the key (AKA a
and c
). In other situations you can get keys and values together, but only when you are iterating over a dictionary's items()
view rather than on the dictionary directly.