问题
I have written the following code to generate k-elements itemsets from 2-element sets. The two elements sets are passed to candidateItemsetGen as clist1 and clist2.
public static void candidateItemsetGen(ArrayList<Integer> clist1, ArrayList<Integer> clist2)
{
for(int i = 0; i < clist1.size(); i++)
{
for(int j = i+1; j < clist2.size(); j++)
{
for(int k = 0; k < clist1.size()-2; k++)
{
int r = clist1.get(k).compareTo(clist2.get(k));
if(r == 0 && clist1.get(k)-1 == clist2.get(k)-1)
{
** candidateItemset.add(clist1.get(i), clist1.get(clist1.size()-1), clist2.get(clist2.size()-1));
}
}
}
}
// return candidateItemset;
}
The condition to create k-itemsets is that clist1(i) == clist2(i), where i = 1,...,k-2 and clist1(k-2) != clist2(k-2). But there is error in the code where i have put **. How can i fix this? The logic is that this function generates candidateItemsets which will be used again as an input to generate other candidate Itemsets.
回答1:
The add
method in ArrayList takes a maximum of two arguments and you are passing in three. If you wish to add all three items, call add(Integer i)
three times.
Also, if you want to return candidateItemsets
from the function you must declare an ArrayList<Integer>
return value and create the list:
public static ArrayList<Integer> candidateItemsetGen(ArrayList<Integer> clist1, ArrayList<Integer> clist2) {
ArrayList<Integer> candidateItemset = new ArrayList<Integer>();
for (int i = 0; i < clist1.size(); i++) {
for (int j = i + 1; j < clist2.size(); j++) {
for (int k = 0; k < clist1.size() - 2; k++) {
int r = clist1.get(k).compareTo(clist2.get(k));
if(r == 0 && clist1.get(k) - 1 == clist2.get(k) - 1) {
candidateItemset.add(clist1.get(i));
candidateItemset.add(clist1.get(clist1.size() - 1));
candidateItemset.add(clist2.get(clist2.size() - 1));
}
}
}
}
return candidateItemset;
}
If you want to add all three as a group of related values, store them together in a separate data structure and add that to candidateItemset
(of correct type).
回答2:
You could optimize that code further if you consider that each list of itemsets are sorted according to the lexical order.
For example, let's say that
clist1 = AB, AD, AF, AG, BC, FG
clist2 = BD, FE, FG, FH, FI
With your code, you will compare AB with all the itemsets of clist2.
But you could optimize that, by stoping right after BD because B is larger than A in AB according to the lexical order. Therefore, no itemsets after BD in Clist2 will match with AB.
If you want to see the code of an optimized implementation of Apriori, you can check my open source data mining library named SPMF
来源:https://stackoverflow.com/questions/17125742/creating-k-itemsets-from-2-itemsets