apriori | 易学教程

How to find frequent itemset irrespective of attribute name?

阅读更多关于 How to find frequent itemset irrespective of attribute name?

问题 I have a dataset (CSV file) to find frequent itemsets using Apriori algorithm. col1, col2, col3 bread, butter,? coke, bread, butter I am using WEKA for this purpose. The ouput is in the following format: ... Large Itemsets L(2): col1=bread col2= butter 1 col1=coke col2= bread 1 col1=coke col3= butter 1 col2= bread col3= butter 1 ... But the output that I am want is : bread, butter 2 Basically, the above output is independent of the col that they belong to. How can I achieve this kind of

arules: How find the data matching an lhs(rule) in R or an SQL WHERE clause?

阅读更多关于 arules: How find the data matching an lhs(rule) in R or an SQL WHERE clause?

问题 I'm finding working with the arule package a bit tricky. I'm using the apriori algorithm to find association rules; something similar to an example in the arules documentation. data("AdultUCI") dim(AdultUCI) AdultUCI[1:2,] #Ignore everything from here to the last two lines, this is just data preparation ## remove attributes AdultUCI[["fnlwgt"]] <- NULL AdultUCI[["education-num"]] <- NULL ## map metric attributes AdultUCI[[ "age"]] <- ordered(cut(AdultUCI[[ "age"]], c(15,25,45,65,100)), labels

R arules, mine only rules from specific column

阅读更多关于 R arules, mine only rules from specific column

问题 I would like to mine specific rhs rules. There is an example in the documentation which demonstrates that this is possible, but only for a specific case (as we see below). First an data set to illustrate my problem: input <- matrix( c( rep(10001,6) , rep(10002,3) , rep(10003,3), 100001,100002,100003,100004,100005,100006,100002,100003,100007,100002,100003,100008,rep('a',6),rep('b',6)), ncol=3) colnames(input) <- c(letters[1:3]) input <- as.data.frame(input) Now i can create rules: r <- apriori

频繁项集的产生及经典算法

阅读更多关于频繁项集的产生及经典算法

前言：　　关联规则是数据挖掘中最活跃的研究方法之一，是指搜索业务系统中的所有细节或事务，找出所有能把一组事件或数据项与另一组事件或数据项联系起来的规则，以获得存在于数据库中的不为人知的或不能确定的信息，它侧重于确定数据中不同领域之间的联系，也是在无指导学习系统中挖掘本地模式的最普通形式。　　一般来说，关联规则挖掘是指从一个大型的数据集（Dataset）发现有趣的关联（Association）或相关关系（Correlation），即从数据集中识别出频繁出现的属性值集（Sets of Attribute Values），也称为频繁项集（Frequent Itemsets，频繁集），然后利用这些频繁项集创建描述关联关系的规则的过程。关联规则挖掘问题: 　　发现频繁项集:现所有的频繁项集是形成关联规则的基础。通过用户给定的最小支持度，寻找所有支持度大于或等于Minsupport的频繁项集。　　生成关联规则:通过用户给定的最小可信度，在每个最大频繁项集中，寻找可信度不小于Minconfidence的关联规则. 　　如何迅速高效地发现所有频繁项集，是关联规则挖掘的核心问题，也是衡量关联规则挖掘算法效率的重要标准。　　经典的挖掘完全频繁项集方法是查找频繁项集集合的全集。其中包括基于广度优先算法搜索的关联规则算法--Apriori算法(通过多次迭代找出所有的频繁项集

creating k -itemsets from 2-itemsets

阅读更多关于 creating k -itemsets from 2-itemsets

问题 I have written the following code to generate k-elements itemsets from 2-element sets. The two elements sets are passed to candidateItemsetGen as clist1 and clist2. public static void candidateItemsetGen(ArrayList<Integer> clist1, ArrayList<Integer> clist2) { for(int i = 0; i < clist1.size(); i++) { for(int j = i+1; j < clist2.size(); j++) { for(int k = 0; k < clist1.size()-2; k++) { int r = clist1.get(k).compareTo(clist2.get(k)); if(r == 0 && clist1.get(k)-1 == clist2.get(k)-1) { **

cspade() R Error

阅读更多关于 cspade() R Error

问题 I am trying to mine rules from the events of cable modems. Linked is one file of thousands. When I try and run the cspade algorithm on the merged file of all devices (12 million rows) it spends hours chewing through RAM until it uses all 64 GB I have available. So I attempted to run the algorithm on the linked file for just one device. I see the exact same thing happen. Since this sub sample is only 2190 rows I thought this was strange. Can someone explain why Im not seeing results in a

R - arules apriori Error in length(obj) : Method length not implemented for class rules

阅读更多关于 R - arules apriori Error in length(obj) : Method length not implemented for class rules

问题 I am attempting to make an association rules set using apriori - I am using a different dataset but the starwars dataset contains similar issues. Using arules I was attempting to list the rules and apply an arulesViz plot. From my understanding all strings must be ran as factors, listed as transactions and then apriori should be functioning properly but I get the ouput below after running the following code and rules is not added to environment: install.packages("arules") install.packages(

Is it possible to run apriori association rule in mysql statement?

阅读更多关于 Is it possible to run apriori association rule in mysql statement?

问题 Database: Transacation# Items List T1 butter T1 jam T2 butter T3 bread T3 ice cream T4 butter T4 jam In the above table, Is it possible to run apriori association rule in mysql statement? For example, the support of buys(T, butter) --> buys(T, jam) = 50% because there are 4 transactions and T1, T4 satisfy "support" rule. Can i just use a sql statement to find out such result? 回答1: Yes, you can use SQL to find the support of a single item. But if you want to find itemsets containing more than

cspade() R Error

阅读更多关于 cspade() R Error

I am trying to mine rules from the events of cable modems. Linked is one file of thousands. When I try and run the cspade algorithm on the merged file of all devices (12 million rows) it spends hours chewing through RAM until it uses all 64 GB I have available. So I attempted to run the algorithm on the linked file for just one device. I see the exact same thing happen. Since this sub sample is only 2190 rows I thought this was strange. Can someone explain why Im not seeing results in a timely matter on this small data set? https://drive.google.com/file/d/0B6VvhxxLVGccVnhDNmVKUE0yaEk/view?usp

Using the apriori algorithm for recommendations

阅读更多关于 Using the apriori algorithm for recommendations

So a recent question made me aware of the rather cool apriori algorithm . I can see why it works, but what I'm not sure about is practical uses. Presumably the main reason to compute related sets of items is to be able to provide recommendations for someone based on their own purchases (or owned items, etcetera). But how do you go from a set of related sets of items to individual recommendations? The Wikipedia article finishes: The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence. Suppose one of the large itemsets is Lk, Lk =