问题
Within the apriori function, I want the outcome to only contain these two variables in the LHS HouseOwnerFlag=0
and HouseOwnerFlag=1
. The RHS should only contain attributes from the column Product
. For instance:
# lhs rhs support confidence lift
# 1 {HouseOwnerFlag=0} => {Product=SV 16xDVD M360 Black} 0.2500000 0.2500000 1.000000
# 2 {HouseOwnerFlag=1} => {Product=Adventure Works 26" 720p} 0.2500000 0.2500000 1.000000
# 3 {HouseOwnerFlag=0} => {Product=Litware Wall Lamp E3015 Silver} 0.1666667 0.3333333 1.333333
# 4 {HouseOwnerFlag=1} => {Product=Contoso Coffee Maker 5C E0900} 0.1666667 0.3333333 1.333333
Part of the answer is solved in this question: R arules, mine only rules from specific column
So now I use the following: rules <- apriori(sales, parameter=list(support =0.01, confidence =0.8, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1")))
Then I use this from that other SO question to ensure that only the Product column is on the RHS: inspect( subset( rules, subset = rhs %pin% "Product=" ) )
The outcome is like this:
# lhs rhs support confidence lift
# 1 {ProductKey=153, IncomeGroup=Moderate, BrandName=Adventure Works } => {Product=SV 16xDVD M360 Black} 0.2500000 0.2500000 1.000000
# 2 {ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video } => {Product=Adventure Works 26" 720p} 0.2500000 0.2500000 1.000000
# 3 {BrandName=Southridge Video, NumberChildrenAtHome=0 } => {Product=Litware Wall Lamp E3015 Silver} 0.1666667 0.3333333 1.333333
# 4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 } => {Product=Contoso Coffee Maker 5C E0900} 0.1666667 0.3333333 1.333333
So apparently the LHS is able to contain every possible column, not just HouseOwnerFlag
like I specified. From other stackoverflow questions, I see that I can put default="rhs"
in the apriori function, like so: rules <- apriori(sales, parameter=list(support =0.001, confidence =0.5, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), default="rhs"))
Then upon inspecting (without the subset part, just inspect(rules
), there are far less rules (7) than before but it does indeed only contain HouseOwnerFlag
in the LHS:
# lhs rhs support confidence lift
# 1 {HouseOwnerFlag=0} => {MaritalStatus=S} 0.2500000 0.2500000 1.000000
# 2 {HouseOwnerFlag=1} => {Gender=M} 0.2500000 0.2500000 1.000000
# 3 {HouseOwnerFlag=0} => {NumberChildrenAtHome=0} 0.1666667 0.3333333 1.333333
# 4 {HouseOwnerFlag=1} => {Gender=M} 0.1666667 0.3333333 1.333333
However on the RHS there's nothing from the column Product in the RHS. So it has no use to inspect
it with subset
as ofcourse it would return null. I tested it several times with different support numbers to experiment and see if Product would appear or not, but the 7 same rules remain the same.
So my question is, how can I specify both the LHS (HouseOwnerFlag) and RHS (Product)? What am I doing wrong?
EDIT: You can reproduce this problem by downloading this testdataset from https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0
Mind you, I only took the first 20 rows from a huge dataset, so the output here won't have the same product names as the example I displayed above unfortunately. But the problem still remains the same. I want to be able to get only HouseOwnerFlag=0
and/or HouseOwnerFlag=1
on the LHS and the column Product
on the RHS.
回答1:
It seems that one can't constrain lhs and rhs at once (I also did not before playing with your data). But you can use subset. EDIT: I was wrong, you can also constrain lhs and rhs at once, see below for another solution. I keep Solution 1 because in some cases it might be useful to compute a bigger set and then split by the left hand side.
Solution 1:
rules_sales <- apriori(sales,
parameter=list(support =0.001, confidence =0.5, minlen=2, maxlen=2),
appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"),
default="rhs"))
rules_subset <- subset(rules_sales, (rhs %in% paste0("Product=", unique(sales$Product))))
inspect(rules_subset)
gives:
lhs rhs support confidence lift
1 {HouseOwnerFlag=0} => {Product=SV DVD Movies E100 Yellow} 0.05 0.5 10
2 {HouseOwnerFlag=0} => {Product=Fabrikam Refrigerator 4.6CuFt E2800 Grey} 0.05 0.5 5
3 {HouseOwnerFlag=1} => {Product=Contoso SLR Camera M144 Gold} 0.10 0.5 5
But you should be careful about your low support:
Warning in apriori(sales, parameter = list(support = 0.001, confidence = 0.5, :
You chose a very low absolute support count of 0. You might run out of memory! Increase minimum support.
Solution 2:
I was tricked by the definition of the parameter default. Using lhs and rhs at once tells each item that is assigned to one of them, that it can only be used for lhs/rhs. The parameter "default" is automatically set to "both" and all other items not used in lhs/rhs can be used for both (Explanation of the appearence parameter as implemented in the R package: http://www.inside-r.org/node/86290, I realised that it must be possible when reading the manual of the original C implementation: http://www.borgelt.net/doc/apriori/apriori.html#appearin). You have to set default="none"
then you can constrain lhs and rhs without using a subset later.
rules_sales <- apriori(sales,
parameter=list(support =0.001, confidence =0.5, minlen=2, maxlen=2),
appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"),
rhs=paste0("Product=", unique(sales$Product)), default="none"))
回答2:
I am very late to the party... but as I am also playing now with the package, let me include my thoughts in case is helpful for someone.
The rules included in the output are the ones that are compliant with the support and confidence parameters. So, if you don't have any rules with the format you expect try relax these constraints: lower support, lower confidence. The lhs, as far as I have found can only contain one term, so you could restrict this part to the terms you want to appear (Product) in order to speed up the rules generation. I haven't tried on your specific dataset but I think this is general advise that should work in all cases.
回答3:
Please try the solution below:
rules_subset <- subset(rules, (lhs %oin% c("HouseOwnerFlag=0", "HouseOwnerFlag=1") & rhs %pin% c("Product=") ))
来源:https://stackoverflow.com/questions/27926131/how-to-get-items-for-both-lhs-and-rhs-for-only-specific-columns-in-arules