I\'m looking for a faster way to calculate GC content for DNA strings read in from a FASTA file. This boils down to taking a string and counting the number of times that the let
Better to not split at all, just count the matches:
gcCount2 <- function(line, st, sp){ sum(gregexpr('[GCgc]', substr(line, st, sp))[[1]] > 0) }
That's an order of magnitude faster.
A small C function that just iterates over the characters would be yet another order of magnitude faster.