问题
I have a dataset and I would like to the probability of transition.
So I have three alphabets like this (13 states) which are possible: CCE CRE DEE FOE GOE ICE ISE MEE PCE PRE PSE RLE WAE
For example,
# A<- c('A-A-A-B', 'A-A-A-A', 'A-B-C-D', 'A-A')
A<- c('CCE-CRE-DEE-DEE', 'FOE-FOE-GOE-GOE-GOE-ISE', 'ISE-PCE', 'ISE')
library('stringr')
B<- str_count(A, "-")
df<- data.frame(A, B)
I would like to get the transition among the letters, for example in the total transitions (how many are to other states assuming A,B,C,D are different states?),
I am expecting output as follows:
B here is total transitions occurring in sequence C here is the total transitions to other states
df$C
1
0
3
0
回答1:
You can use rle
from base R,
sapply(strsplit(A, '-'), function(i)length(rle(i)$lengths) - 1)
#[1] 1 0 3 0
回答2:
You could use gsub from base R:
str_count(gsub('(\\w+)(-?\\1)+','\\1', A),'-')
EDIT: to get the unique counts, just add 1 to the results you have
来源:https://stackoverflow.com/questions/64771962/transitions-in-a-sequence