Faster way to split a string and count characters using R?

后端 未结 6 740
太阳男子
太阳男子 2021-02-01 08:51

I\'m looking for a faster way to calculate GC content for DNA strings read in from a FASTA file. This boils down to taking a string and counting the number of times that the let

6条回答
  •  囚心锁ツ
    2021-02-01 09:04

    Better to not split at all, just count the matches:

    gcCount2 <-  function(line, st, sp){
      sum(gregexpr('[GCgc]', substr(line, st, sp))[[1]] > 0)
    }
    

    That's an order of magnitude faster.

    A small C function that just iterates over the characters would be yet another order of magnitude faster.

提交回复
热议问题