Bases refers to A,T,G and C
sample = [[\'CGG\',\'ATT\'],[\'GCGC\',\'TAAA\']]
# Note on fragility of data: Each element can only be made up only 2 of the 4 bases
You are not really using Counter
any different than a plain dict
. Try something like the following approach:
>>> sample = [['CGG','ATT'],['GCGC','TAAA']]
>>> from collections import Counter
>>> base_counts = [[Counter(base) for base in sub] for sub in sample]
>>> base_counts
[[Counter({'G': 2, 'C': 1}), Counter({'T': 2, 'A': 1})], [Counter({'G': 2, 'C': 2}), Counter({'A': 3, 'T': 1})]]
Now you can continue with a functional approach using nested comprehensions to transform your data*:
>>> base_freqs = [[{k_v[0]:k_v[1]/len(bases[i]) for i,k_v in enumerate(count.items())} for count in counts]
... for counts, bases in zip(base_counts, sample)]
>>>
>>> base_freqs
[[{'G': 0.6666666666666666, 'C': 0.3333333333333333}, {'A': 0.3333333333333333, 'T': 0.6666666666666666}], [{'G': 0.5, 'C': 0.5}, {'A': 0.75, 'T': 0.25}]]
>>>
*Note, some people do not like big, nested comprehensions like that. I think it's fine as long as you are sticking to functional constructs and not mutating data structures inside your comprehensions. I actually find it very expressive. Others disagree vehemently. You can always unfold that code into nested for-loops.
Anyway, you can then work the same thing with the pairs. First:
>>> pairs = [list(zip(*bases)) for bases in sample]
>>> pairs
[[('C', 'A'), ('G', 'T'), ('G', 'T')], [('G', 'T'), ('C', 'A'), ('G', 'A'), ('C', 'A')]]
>>> pair_counts = [Counter(base_pair) for base_pair in pairs]
>>> pair_counts
[Counter({('G', 'T'): 2, ('C', 'A'): 1}), Counter({('C', 'A'): 2, ('G', 'T'): 1, ('G', 'A'): 1})]
>>>
Now, here it is easier to not use comprehensions so we don't have to calculate total
more than once:
>>> pair_freq = []
>>> for count in pair_counts:
... total = sum(count.values())
... pair_freq.append({k:c/total for k,c in count.items()})
...
>>> pair_freq
[{('C', 'A'): 0.3333333333333333, ('G', 'T'): 0.6666666666666666}, {('G', 'T'): 0.25, ('C', 'A'): 0.5, ('G', 'A'): 0.25}]
>>>