frequency of letters in column python

后端未结

关注

 2  1542

I want to calculate the frequency of occurrence of each letter in all columns: for example I have this three sequences :

seq1=AATC
seq2=GCCT
seq3=ATCA
<

相关标签:

2条回答

南方客

2021-01-22 09:31

As with my answer to your last question, you should wrap your functionality in a function:

def lettercount(pos):
    return {c: pos.count(c) for c in pos}

Then you can easily apply it to the tuples from zip:

counts = [lettercount(t) for t in zip(seq1, seq2, seq3)]

Or combine it into the existing loop:

...
counts = []
for position in zip(seq1, seq2, seq3): # sets at same position
    counts.append(lettercount(position))
    for pair in combinations(position, 2): # pairs within set
        ...

0 讨论(0)

不思量自难忘°

2021-01-22 09:31
Here:
```
sequences = ['AATC',
             'GCCT',
             'ATCA']
f = zip(*sequences)
counts = [{letter: column.count(letter) for letter in column} for column in f]
print(counts)
```
Output (reformatted):
```
[{'A': 2, 'G': 1}, 
 {'A': 1, 'C': 1, 'T': 1}, 
 {'C': 2, 'T': 1}, 
 {'A': 1, 'C': 1, 'T': 1}]
```
Salient features:
- Rather than explicitly naming seq1, seq2, etc., we put them into a list.
- We unpack the list with the * operator.
- We use a dict comprehension inside a list comprehension to generate the counts for each letter in each column. It's basically what you did for the one-sequence case, but more readable (IMO).
0 讨论(0)
发布评论:

提交评论
- 加载中...