Error in levels for seqdef in R

£可爱£侵袭症+ 提交于 2019-12-23 03:14:23

问题


I've seen this error everytime I try to run seqdef on my data that has already been converted to STS format using seqformat. A sample of my dataframe looks like

head(df.new, 10)
   user_id orderdate         cart to
1        8         1      produce 30
2        8        31      produce 60
3        8        61      produce 70
4        8        71      produce 92
5       10         1      produce 30
6       10        31      produce 42
7       10        43 meat seafood 56
8       10        57         deli 77
9       17         1    beverages  3
10      17         4    beverages  8

It has a total of 14000 rows of orders and there are some orders which occur on the same day for each user (i.e. orderdate == to). Below are the codes that I have used to create my STS data which is used as input to seqdef.

df.form <- seqformat(df.new, id='user_id', begin='orderdate', end='to', status='cart', from='SPELL', to='STS', process=FALSE)
df.seq <- seqdef(df.form, left='DEL', right = 'unknown', xtstep=10, void = 'unknown')

The error message I get when running the seqdef is

 [>] found missing values ('NA') in sequence data
 [>] preparing 35000 sequences
 [>] coding void elements with 'unknown' and missing values with '*'
 [>] 21 distinct states appear in the data: 
     1 = alcohol
     2 = babies
     3 = bakery
     4 = beverages
     5 = breakfast
     6 = bulk
     7 = canned goods
     8 = dairy eggs
     9 = deli
     10 = dry goods pasta
     11 = frozen
     12 = household
      ...
 [>] adding special state(s) to the alphabet: unknown
Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  : 
  factor level [24] is duplicated

I tried removing those orders where orderdate == to and the same error still occurs. I would appreciate any help I can get to fix this problem. Thanks.


回答1:


The error occurs because you are using the same code ('unknown') for right missings and voids.

When the sequences contain 'missings', these missings will be considered as a separate state when you set with.missing = TRUE in functions such as seqdist or seqdplot, while voids are used to adjust the row lengths and are simply ignored when plotting the sequences (seqplot) or computing dissimilarities (seqdist).



来源:https://stackoverflow.com/questions/44961885/error-in-levels-for-seqdef-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!