frequency of letters in column python

后端 未结 2 1542
感情败类
感情败类 2021-01-22 09:12

I want to calculate the frequency of occurrence of each letter in all columns: for example I have this three sequences :

seq1=AATC
seq2=GCCT
seq3=ATCA
<         


        
相关标签:
2条回答
  • 2021-01-22 09:31

    As with my answer to your last question, you should wrap your functionality in a function:

    def lettercount(pos):
        return {c: pos.count(c) for c in pos}
    

    Then you can easily apply it to the tuples from zip:

    counts = [lettercount(t) for t in zip(seq1, seq2, seq3)]
    

    Or combine it into the existing loop:

    ...
    counts = []
    for position in zip(seq1, seq2, seq3): # sets at same position
        counts.append(lettercount(position))
        for pair in combinations(position, 2): # pairs within set
            ...
    
    0 讨论(0)
  • 2021-01-22 09:31

    Here:

    sequences = ['AATC',
                 'GCCT',
                 'ATCA']
    f = zip(*sequences)
    counts = [{letter: column.count(letter) for letter in column} for column in f]
    print(counts)
    

    Output (reformatted):

    [{'A': 2, 'G': 1}, 
     {'A': 1, 'C': 1, 'T': 1}, 
     {'C': 2, 'T': 1}, 
     {'A': 1, 'C': 1, 'T': 1}]
    

    Salient features:

    • Rather than explicitly naming seq1, seq2, etc., we put them into a list.
    • We unpack the list with the * operator.
    • We use a dict comprehension inside a list comprehension to generate the counts for each letter in each column. It's basically what you did for the one-sequence case, but more readable (IMO).
    0 讨论(0)
提交回复
热议问题