Python switching multiple positions in string each to multiple letters

问题

I am trying to write a python code that finds restriction enzyme sites within a sequence of DNA. Restriction enzymes cut at specific DNA sequences, however some are not so strict, for example XmnI cuts this sequence:

GAANNNNTTC

Where N can be any nucleotide (A, C, G, or T). If my math is right thats 4^4 = 256 unique sequences that it can cut. I want to make a list of these 256 short sequences, then check each one against a (longer) input DNA sequence. However, I'm having a hard time generating the 256 sequences. Here's what I have so far:

cutsequencequery = "GAANNNNTTC"
Nseq = ["A", "C", "G", "T"]
querylist = []
if "N" in cutsequencequery:
    Nlist = [cutsequencequery.replace("N", t) for t in Nseq]
    for j in list(Nlist):
        querylist.append(j)

for i in querylist:
    print(i)
print(len(querylist))

and here is the output:

GAAAAAATTC
GAACCCCTTC
GAAGGGGTTC
GAATTTTTTC
4

So it's switching each N to either A, C, G, and T, but I think I need another loop (or 3?) to generate all 256 combinations. Is there an efficient way to do this that I'm not seeing?

回答1:

Maybe you should take a look into python's itertools library, which include product which creates an iterable with every combination of iterables, therefore:

from itertools import product

cutsequencequery = "GAANNNNTTC"
nseq = ["A", "C", "G", "T"]

size = cutsequencequery.count('N')

possibilities = product(*[nseq for i in range(size)]) 
# = ('A', 'A', 'A', 'A'), ... , ('T', 'T', 'T', 'T') 
# len(list(possibilities)) = 256 = 4^4, as expected

s = set()
for n in possibilities:
    print(''.join(n)) # = 'AAAA', ..., 'TTTT'
    new_sequence = cutsequencequery.replace('N' * size, ''.join(n))
    
    s.add(new_sequence)
    print(new_sequence) # = 'GAAAAAATTC', ..., 'GAATTTTTTC'
print(len(s)) # 256 unique sequences

来源：https://stackoverflow.com/questions/64435037/python-switching-multiple-positions-in-string-each-to-multiple-letters

标签

python

string

loops

random

Sequence