R Basket analysis using arules package with unique order number but duplicate order combinations
Just learning R. I\'m trying to do a basket analysis using the arules pa
Ok, after hours of searching and reading all the pdfs I could find, I finally found the answer (and most helpful walkthrough of apriori/basket analysis ever!) in the DATA MINING Desktop Survival Guide by Graham Williams:
The read.transactions function can also read data from a file with transaction ID and a single item per line (using the format="single" option).
So there was no need to do all those transformations after import. I should have just been importing straight from the original csv file specifying the "single" format option instead of "basket." I also had to make sure the file contained no column names and that there was a unique representation of item type paired with order number (for instance, if a person ordered two items from the "Grocery" category, this needs to be represented on one row). And the cols=c(2,1)
option indicates that column 1 contains the order number and column 2 is the rest of the data (ItemType).
tr <- read.transactions(file='dataset.csv', format='single', sep=',', cols=c(2,1))
You must remove duplicates, if you are using .CSV file, please run Data -> Remove Duplicate in Excel before processing this file. arules throws error if duplicate are found and it is because of that you are getting the error.
Another way is to use duplicated() on your itemset and remove the duplicate using unique().
Or a more simple approach would be found in this SO post
Association analysis with duplicate transactions using arules package in R