问题
I am looking to use the arulesSequences
package in R. However, I have no idea as to how to coerce my data frame into an object that can leverage this package.
Here is a toy dataset that replicates my data structure:
ids <- c(rep("X", 5), rep("Y", 5), rep("Z", 5))
seq <- rep(1:5,3)
val <- sample(LETTERS, 15, replace=T)
df <- data.frame(ids, seq, val)
df
ids seq val
1 X 1 T
2 X 2 H
3 X 3 V
4 X 4 A
5 X 5 X
6 Y 1 D
7 Y 2 B
8 Y 3 A
9 Y 4 D
10 Y 5 P
11 Z 1 Q
12 Z 2 R
13 Z 3 W
14 Z 4 W
15 Z 5 P
Any help will be greatly appreciated.
回答1:
Factor data frame:
df_fact = data.frame(lapply(df,as.factor))
Build "transaction" data:
df_trans = as(df_fact, 'transactions')
Test it:
itemFrequencyPlot(df_trans, support = 0.1, cex.names=0.8)
回答2:
By using read_baskets:
read_baskets(con = filePath.txt,
sep = " ",
info = c("sequenceID","eventID","SIZE"))
Which in practice means exporting the created data to a text-file and re-importing it through read_baskets. The info argument defines the first columns containing the sequenceID, eventID and an optional eventsize column.
回答3:
It worked for me add an essentially "order" column that lists a order ranking rather than a time value. You just have to be very specific in the naming convention. Try and name the "group" or "ordered basket #" variable sequenceID, and call the ranking or ordering eventID.
Another thing that helped me (and had me scratching my head for a long time) was that read_baskets() seemed to need me to specify
read_baskets(con = filePath.txt, sep = " ", info = c("sequenceID","eventID","SIZE"))
Even though the help function makes the c() details seem like an optional header, it is not. I seemed to need to remove the header from my file and specify it in the read_baskets() command, or I'd run into problems.
回答4:
Instead of using the data frame, what worked best for me was to split the data into individual and than convert to transactions.
eh$cost<-split(eh$cost$val ,eh$cost$id)
eh$cost1<- as(eh$cost,"transactions")
回答5:
You have to first change your items into transactions so just coerce the column of itemstrans = as(df[,'val'], "transactions")
then you can add the information to your transactions object
trans@itemsetInfo$transactionID = NULL
trans@itemsetInfo$sequenceID = df$ids
trans@itemsetInfo$eventID = df$seq
来源:https://stackoverflow.com/questions/13022102/arules-sequence-mining-in-r