I have a list of strings (DNA sequence) including A,T,C,G. I want to find all matches and insert into table whose columns are all possible combination of those DNA alphabet (4^k
Another way to do this:
DNAlst<-list("CAAACTGATTTT","GATGAAAGTAAAATACCG","ATTATGC","TGGA","CGCGCATCAA","ACACACACACCA")
len <- 4
stri_sub_fun <- function(x) table(stri_sub(x,1:(stri_length(x)-len+1),length = len))
sapply(DNAlst, stri_sub_fun)
[[1]]
AAAC AACT ACTG ATTT CAAA CTGA GATT TGAT TTTT
1 1 1 1 1 1 1 1 1
[[2]]
AAAA AAAG AAAT AAGT AATA ACCG AGTA ATAC ATGA GAAA GATG GTAA TAAA TACC TGAA
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[[3]]
ATGC ATTA TATG TTAT
1 1 1 1
[[4]]
TGGA
1
[[5]]
ATCA CATC CGCA CGCG GCAT GCGC TCAA
1 1 1 1 1 1 1
[[6]]
ACAC ACCA CACA CACC
4 1 3 1