traminer

Error in levels for seqdef in R

£可爱£侵袭症+ 提交于 2019-12-23 03:14:23
问题 I've seen this error everytime I try to run seqdef on my data that has already been converted to STS format using seqformat. A sample of my dataframe looks like head(df.new, 10) user_id orderdate cart to 1 8 1 produce 30 2 8 31 produce 60 3 8 61 produce 70 4 8 71 produce 92 5 10 1 produce 30 6 10 31 produce 42 7 10 43 meat seafood 56 8 10 57 deli 77 9 17 1 beverages 3 10 17 4 beverages 8 It has a total of 14000 rows of orders and there are some orders which occur on the same day for each user

An “asymmetric” pairwise distance matrix

我的梦境 提交于 2019-12-21 12:20:08
问题 Suppose there are three sequences to be compared: a, b, and c. Traditionally, the resulting 3-by-3 pairwise distance matrix is symmetric , indicating that the distance from a to b is equal to the distance from b to a. I am wondering if TraMineR provides some way to produce an asymmetric pairwise distance matrix. 回答1: No, TraMineR does not produce 'assymetric' dissimilaries precisely for the reasons stressed in Pat's comment. The main interest of computing pairwise dissimilarities between

Creating a sequence object from SPELL data

眉间皱痕 提交于 2019-12-21 04:54:25
问题 I am trying to create a sequence object with seqdef using SPELL format. Here is an example of my data: spell <- structure(list(ID = c(1, 3, 3, 4, 5, 5, 6, 8, 9, 10, 11, 11, 12, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19), status = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 3, 1, 1, 1, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 1, 3, 3, 1, 3, 1, 1, 1), time1 = c(1, 1, 57, 1, 1, 91, 1, 1, 1, 1, 1,

Find specific patterns in sequences

元气小坏坏 提交于 2019-12-18 05:58:09
问题 I'm using R package TraMineR to make some academic research sequence analysis. I want to find a pattern defined as someone being in the target company, then going out, then coming back to the target company. (simplified) I've define state A as target company; B as outside industry company and C as inside industry company. So what I want to do is find sequences with the specific patterns A-B-A or A-C-A. After looking at this question (Strange number of subsequences? ) and reading the user

Traminer substitution cost

空扰寡人 提交于 2019-12-13 19:24:54
问题 I have a logical problem with the transition cost matrix. I am working on sequences dissimilarity using the R package Traminer. I try to give you a simple example (very simple, but I hope useful to explain my problem): There are three sequences and I want to be calculate the dissimilarity matrix. The alphabet is: H (in health), I (ill at home), IH (ill at hospital), D (died) I observe the 3 subjects for 5 observations. These are the sequences: H – H – I – D – D H – I – I – I – D I – I – H –

Load frequent subsequences from TXT

孤人 提交于 2019-12-13 06:08:34
问题 Is it possible to load a list of frequent subsequences from a .txt file, and make TraMineR recognize it as a sequence object? Unfortunately I don't have the raw data, therefore I am not able to recreate the analysis. The only file what I have is a .txt file containing the frequent subsequences. I assume it was created with the seqefsub() function from the TraMineR package, with maxGap=2 , because the data looks like as an output of the mentioned function. read.table() reads it as a data frame

Inconsistency between sequences and seqiplot

ⅰ亾dé卋堺 提交于 2019-12-12 21:24:17
问题 I am using the function seqiplot to create a sequence index plot. The problem is that I get some inconsistencies between what is shown on the plot and my sequence data. For example, I have the same sequence state in period t and t+1; however, the sequence index plot shows different colours for each period. Should not they have the same colour? I suspect that it has to do with the number of posible states in my data set. There are 60 different states. So when I try to set the colour scheme I

Fitting a VLMC to very long sequences

霸气de小男生 提交于 2019-12-11 03:39:26
问题 I am trying to fit a VLMC to a dataset where the longest sequence is 296 states. I do it as shown below: # Load libraries library(PST) library(RCurl) library(TraMineR) # Load and transform data x <- getURL("https://gist.githubusercontent.com/aronlindberg/08228977353bf6dc2edb3ec121f54a29/raw/241ef39125ecb55a85b43d7f4cd3d58f617b2ecf/challenge_level.csv") data <- read.csv(text = x) data.seq <- seqdef(data[,2:ncol(data)], missing = NA, right = NA, nr = "*") S1 <- pstree(data.seq, ymin = 0.01, lik

How to identify sequences within each cluster?

六眼飞鱼酱① 提交于 2019-12-11 02:27:25
问题 Using the biofam dataset that comes as part of TraMineR : library(TraMineR) data(biofam) lab <- c("P","L","M","LM","C","LC","LMC","D") biofam.seq <- seqdef(biofam[,10:25], states=lab) head(biofam.seq) Sequence 1167 P-P-P-P-P-P-P-P-P-LM-LMC-LMC-LMC-LMC-LMC-LMC 514 P-L-L-L-L-L-L-L-L-L-L-LM-LMC-LMC-LMC-LMC 1013 P-P-P-P-P-P-P-L-L-L-L-L-LM-LMC-LMC-LMC 275 P-P-P-P-P-L-L-L-L-L-L-L-L-L-L-L 2580 P-P-P-P-P-L-L-L-L-L-L-L-L-LMC-LMC-LMC 773 P-P-P-P-P-P-P-P-P-P-P-P-P-P-P-P I can perform a cluster analysis:

Strange number of subsequences?

我只是一个虾纸丫 提交于 2019-12-10 19:10:16
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 5 years ago . I have a sequence object created like this: subsequences <- function(data){ slmax <- max(data$time) sequences.seqe <- seqecreate(data) sequences.sts <- seqformat(data, from="SPELL", to="DSS", begin="time", end="end", id="id", status="event", limit=slmax) sequences.sts <- seqdef(sequences.sts, right = "DEL", left = "DEL", gaps = "DEL") (sequences.sts) } data <- subsequences(data