Count the maximum of consecutive letters in a string

那年仲夏 提交于 2019-12-10 15:16:21


I have this vector:

vector <- c("XXXX-X-X", "---X-X-X", "--X---XX", "--X-X--X", "-X---XX-", "-X--X--X", "X-----XX", "X----X-X", "X---XX--", "XX--X---", "---X-XXX", "--X-XX-X")

I want to detect the maximum of consecutive times that appears X. So, my expected vector would be:

4, 1, 2, 1,2, 1, 2, 1, 2, 2, 3, 2


In base R, we can split each vector into separate characters and then using rle find the max consecutive length for "X".

sapply(strsplit(vector, ""), function(x) {
   inds = rle(x)
   max(inds$lengths[inds$values == "X"])

#[1] 4 1 2 1 2 1 2 1 2 2 3 2


Here is a slightly different approach. We can split each term in the input vector on any number of dashes. Then, find the substring with the greatest length.

sapply(vector, function(x) {
    max(nchar(unlist(strsplit(x, "-+"))))

XXXX-X-X ---X-X-X --X---XX --X-X--X -X---XX- -X--X--X X-----XX X----X-X 
       4        1        2        1        2        1        2        1 
X---XX-- XX--X--- ---X-XXX --X-XX-X 
       2        2        3        2 

I suspect that X really just represents any non dash character, so we don't need to explicitly check for it. If you do really only want to count X, then we can try removing all non X characters before we count:

sapply(vector, function(x) {
    max(nchar(gsub("[^X]", "", unlist(strsplit(x, "-+")))))


Use strapply in gsubfn to extract out the X... substrings applying nchar to each to count its number of character producing a list of vectors of lengths. sapply the max function each such vector.


sapply(strapply(vector, "X+", nchar), max)
## [1] 4 1 2 1 2 1 2 1 2 2 3 2


Here are a couple of tidyverse alternatives:

map_dbl(vector, ~sum(str_detect(., strrep("X", 1:8))))
# [1] 4 1 2 1 2 1 2 1 2 2 3 2
map_dbl(strsplit(vector,"-"), ~max(nchar(.)))
# [1] 4 1 2 1 2 1 2 1 2 2 3 2

