storing long strings (DNA sequence) in R

后端 未结 2 1081
刺人心
刺人心 2020-12-21 02:11

I have written a function that finds the indices of subsequences in a long DNA sequence. It works when my longer DNA sequence is < about 4000 characters. However, when I

相关标签:
2条回答
  • 2020-12-21 02:51

    Rather than write your own function, why not use the function words.pos in package seqinr. It seems to work even for strings up to a million base pairs.

    For example,

    library(seqinr)
    data(ec999)
    myseq <- paste(ec999[[1]], collapse="")
    myseq <- paste(rep(myseq,100), collapse="")
    words.pos("atat", myseq)
    
    0 讨论(0)
  • 2020-12-21 02:59

    I can replicate nrussell's example, but this assigns correctly x<-paste0(rep("abcdef",1000),collapse="") -- a potential workaround is writing the character string to a .txt file and reading the .txt file into R directly:

    test.txt is a 6,000 character long string.

    `test<-read.table('test.txt',stringsAsFactors = FALSE)
     length(class(test[1,1]))
    [1] 1
    class(test[1,1])
    [1] "character"
     nchar(test[1,1])
    [1] 6000`
    
    0 讨论(0)
提交回复
热议问题