transitions in a sequence

强颜欢笑 提交于 2020-12-26 12:04:58

问题


I have a dataset and I would like to the probability of transition.

So I have three alphabets like this (13 states) which are possible: CCE CRE DEE FOE GOE ICE ISE MEE PCE PRE PSE RLE WAE

For example,

 # A<- c('A-A-A-B', 'A-A-A-A', 'A-B-C-D', 'A-A')
 A<- c('CCE-CRE-DEE-DEE', 'FOE-FOE-GOE-GOE-GOE-ISE', 'ISE-PCE', 'ISE')
 library('stringr')
 B<- str_count(A, "-")
 df<- data.frame(A, B)

I would like to get the transition among the letters, for example in the total transitions (how many are to other states assuming A,B,C,D are different states?),

I am expecting output as follows:

B here is total transitions occurring in sequence C here is the total transitions to other states

    df$C   
    1        
    0        
    3        
    0        

回答1:


You can use rle from base R,

sapply(strsplit(A, '-'), function(i)length(rle(i)$lengths) - 1)
#[1] 1 0 3 0



回答2:


You could use gsub from base R:

 str_count(gsub('(\\w+)(-?\\1)+','\\1', A),'-')

EDIT: to get the unique counts, just add 1 to the results you have



来源:https://stackoverflow.com/questions/64771962/transitions-in-a-sequence

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!