matching and counting strings (k-mer of DNA) in R

前端 未结 5 986
深忆病人
深忆病人 2021-02-06 11:55

I have a list of strings (DNA sequence) including A,T,C,G. I want to find all matches and insert into table whose columns are all possible combination of those DNA alphabet (4^k

5条回答
  •  一整个雨季
    2021-02-06 12:37

    Another way to do this:

    DNAlst<-list("CAAACTGATTTT","GATGAAAGTAAAATACCG","ATTATGC","TGGA","CGCGCATCAA","ACACACACACCA")
    len <- 4
    stri_sub_fun <- function(x) table(stri_sub(x,1:(stri_length(x)-len+1),length = len))
    sapply(DNAlst, stri_sub_fun)
    [[1]]
    
    AAAC AACT ACTG ATTT CAAA CTGA GATT TGAT TTTT 
       1    1    1    1    1    1    1    1    1 
    
    [[2]]
    
    AAAA AAAG AAAT AAGT AATA ACCG AGTA ATAC ATGA GAAA GATG GTAA TAAA TACC TGAA 
       1    1    1    1    1    1    1    1    1    1    1    1    1    1    1 
    
    [[3]]
    
    ATGC ATTA TATG TTAT 
       1    1    1    1 
    
    [[4]]
    
    TGGA 
       1 
    
    [[5]]
    
    ATCA CATC CGCA CGCG GCAT GCGC TCAA 
       1    1    1    1    1    1    1 
    
    [[6]]
    
    ACAC ACCA CACA CACC 
       4    1    3    1 
    

提交回复
热议问题