问题
i have a set dataframe. My purpose is to convert the dataframe into transactions data in order to do market basket analysis using Arules package in R. I did do some research online regarding conversion of dataframe to transactions data, e.g.(How to prep transaction data into basket for arules) and (Transform csv into transactions for arules), but the result i got was different.
dput(df)
structure(list(Transaction_ID = c("A001", "A002", "A003", "A004", "A005", "A006"),
Fruits = c(NA, "Apple", "Orange", NA, "Pear", "Grape"),
Vegetables = c(NA, NA, NA, "Potato", NA, "Yam"),
Personal = c("ToothP", "ToothP", NA, "ToothB", "ToothB", NA),
Drink = c("Coff", NA, "Coff", "Milk", "Milk", "Coff"),
Other = c(NA, NA, NA, NA, "Promo", NA)),
.Names = c("Transaction_ID", "Fruits", "Vegetables", "Personal", "Drink", "Other"),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))
Below is my dataframe structure
Transaction_ID Fruits Vegetables Personal Drink Other
A001 NA NA ToothP Coff NA
A002 Apple NA ToothP NA NA
A003 Orange NA NA Coff NA
A004 NA Potato ToothB Milk NA
A005 Pear NA ToothB Milk Promo
A006 Grape Yam NA Coff NA
class for each column
sapply(df, class)
Transaction_ID Fruits Vegetables Personal Drink Other
"character" "character" "character" "character" "character" "character"
Convert dataframe to transaction data
data <- as(split(df[,"Fruits"], df[,"Vegetables"],df[,"Personal"], df[,"Drink"], df[,"Other"]), "transactions")
inspect(data)
Results i got
[1] {NA,NA,ToothP,Coff,NA}
[2] {Apple,NA,ToothP,NA,NA}
[3] {Orange,NA,NA,Coff,NA}
[4] {NA,Potato,ToothB,Milk,NA}
[5] {Pear,NA,ToothB,Milk,Promo}
[6] {Grape,Yam,NA,Coff,NA}
The transaction data was successfully converted, but I was wondering is there any way to remove the NA items? since the NA will take consideration as an item if they still remain in the transaction list.
回答1:
Ogustari is right. Here is the complete code that also handles the transaction IDs.
library("arules")
library("dplyr") ### for dbl_df
df <- structure(list(Transaction_ID = c("A001", "A002", "A003", "A004", "A005", "A006"),
Fruits = c(NA, "Apple", "Orange", NA, "Pear", "Grape"),
Vegetables = c(NA, NA, NA, "Potato", NA, "Yam"),
Personal = c("ToothP", "ToothP", NA, "ToothB", "ToothB", NA),
Drink = c("Coff", NA, "Coff", "Milk", "Milk", "Coff"),
Other = c(NA, NA, NA, NA, "Promo", NA)),
.Names = c("Transaction_ID", "Fruits", "Vegetables", "Personal", "Drink", "Other"),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))
### remove transaction IDs
tid <- as.character(df[["Transaction_ID"]])
df <- df[,-1]
### make all columns factors
for(i in 1:ncol(df)) df[[i]] <- as.factor(df[[i]])
trans <- as(df, "transactions")
### set transactionIDs
transactionInfo(trans)[["transactionID"]] <- tid
inspect(trans)
items transactionID
[1] {Personal=ToothP,Drink=Coff} A001
[2] {Personal=ToothP} A002
[3] {Drink=Coff} A003
[4] {Vegetables=Potato,Personal=ToothB,Drink=Milk} A004
[5] {Personal=ToothB,Drink=Milk,Other=Promo} A005
[6] {Vegetables=Yam,Drink=Coff} A006
回答2:
I can propose you this solution but I do not know if is the one you are looking for.
dput(df)
df <- data.frame(structure(list(Transaction_ID = as.factor(c("A001", "A002", "A003", "A004", "A005", "A006")),
Fruits = as.factor(c(NA, "Apple", "Orange", NA, "Pear", "Grape")),
Vegetables = as.factor(c(NA, NA, NA, "Potato", NA, "Yam")),
Personal = as.factor(c("ToothP", "ToothP", NA, "ToothB", "ToothB", NA)),
Drink = as.factor(c("Coff", NA, "Coff", "Milk", "Milk", "Coff")),
Other = as.factor(c(NA, NA, NA, NA, "Promo", NA))),
.Names = c("Transaction_ID", "Fruits", "Vegetables", "Personal", "Drink", "Other"),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L)))
Class for each column Note that the classe are all "Factor"
sapply(df, class)
Transaction_ID Fruits Vegetables Personal Drink Other
"factor" "factor" "factor" "factor" "factor" "factor"
Convert data frame to transaction data
data <- as(df, "transactions")
inspect(data)
The result I've got
items transactionID
[1] {Transaction_ID=A001,
Personal=ToothP,
Drink=Coff} 1
[2] {Transaction_ID=A002,
Fruits=Apple,
Personal=ToothP} 2
[3] {Transaction_ID=A003,
Fruits=Orange,
Drink=Coff} 3
[4] {Transaction_ID=A004,
Vegetables=Potato,
Personal=ToothB,
Drink=Milk} 4
[5] {Transaction_ID=A005,
Fruits=Pear,
Personal=ToothB,
Drink=Milk,
Other=Promo} 5
[6] {Transaction_ID=A006,
Fruits=Grape,
Vegetables=Yam,
Drink=Coff} 6
I found part of the solution here convert data frame in r to transaction or an itemMatrix. Moreover is seems that your command
data <- as(split(df[,"Fruits"], df[,"Vegetables"],df[,"Personal"], df[,"Drink"], df[,"Other"]), "transactions")
inspect(data)
only works for a data.frame containing only two columns.
来源:https://stackoverflow.com/questions/45773861/r-arules-convert-dataframe-into-transactions-and-remove-na