How to calculate counts and frequencies for pairs in list of lists?

后端 未结 1 1737
我寻月下人不归
我寻月下人不归 2021-01-29 00:38

Bases refers to A,T,G and C

sample = [[\'CGG\',\'ATT\'],[\'GCGC\',\'TAAA\']]

# Note on fragility of data: Each element can only be made up only 2 of the 4 bases         


        
相关标签:
1条回答
  • 2021-01-29 01:18

    You are not really using Counter any different than a plain dict. Try something like the following approach:

    >>> sample = [['CGG','ATT'],['GCGC','TAAA']]
    >>> from collections import Counter
    >>> base_counts = [[Counter(base) for base in sub] for sub in sample]
    >>> base_counts
    [[Counter({'G': 2, 'C': 1}), Counter({'T': 2, 'A': 1})], [Counter({'G': 2, 'C': 2}), Counter({'A': 3, 'T': 1})]]
    

    Now you can continue with a functional approach using nested comprehensions to transform your data*:

    >>> base_freqs = [[{k_v[0]:k_v[1]/len(bases[i]) for i,k_v in enumerate(count.items())} for count in counts] 
    ...               for counts, bases in zip(base_counts, sample)]
    >>> 
    >>> base_freqs
    [[{'G': 0.6666666666666666, 'C': 0.3333333333333333}, {'A': 0.3333333333333333, 'T': 0.6666666666666666}], [{'G': 0.5, 'C': 0.5}, {'A': 0.75, 'T': 0.25}]]
    >>> 
    

    *Note, some people do not like big, nested comprehensions like that. I think it's fine as long as you are sticking to functional constructs and not mutating data structures inside your comprehensions. I actually find it very expressive. Others disagree vehemently. You can always unfold that code into nested for-loops.

    Anyway, you can then work the same thing with the pairs. First:

    >>> pairs = [list(zip(*bases)) for bases in sample]
    >>> pairs
    [[('C', 'A'), ('G', 'T'), ('G', 'T')], [('G', 'T'), ('C', 'A'), ('G', 'A'), ('C', 'A')]]
    >>> pair_counts = [Counter(base_pair) for base_pair in pairs]
    >>> pair_counts
    [Counter({('G', 'T'): 2, ('C', 'A'): 1}), Counter({('C', 'A'): 2, ('G', 'T'): 1, ('G', 'A'): 1})]
    >>> 
    

    Now, here it is easier to not use comprehensions so we don't have to calculate total more than once:

    >>> pair_freq = []
    >>> for count in pair_counts:
    ...   total = sum(count.values())
    ...   pair_freq.append({k:c/total for k,c in count.items()})
    ... 
    >>> pair_freq
    [{('C', 'A'): 0.3333333333333333, ('G', 'T'): 0.6666666666666666}, {('G', 'T'): 0.25, ('C', 'A'): 0.5, ('G', 'A'): 0.25}]
    >>> 
    
    0 讨论(0)
提交回复
热议问题