creating k -itemsets from 2-itemsets

血红的双手。 提交于 2019-12-08 10:23:17

问题


I have written the following code to generate k-elements itemsets from 2-element sets. The two elements sets are passed to candidateItemsetGen as clist1 and clist2.

    public static void candidateItemsetGen(ArrayList<Integer> clist1, ArrayList<Integer> clist2) 
        {
            for(int i = 0; i < clist1.size(); i++)
            {
                for(int j = i+1; j < clist2.size(); j++)
                {
                   for(int k = 0; k < clist1.size()-2; k++)
                   {
                       int r = clist1.get(k).compareTo(clist2.get(k));
                       if(r == 0 && clist1.get(k)-1 == clist2.get(k)-1)
                       {
 **                           candidateItemset.add(clist1.get(i), clist1.get(clist1.size()-1), clist2.get(clist2.size()-1));
                       }
                   }
                }
            }
//    return candidateItemset;
        }

The condition to create k-itemsets is that clist1(i) == clist2(i), where i = 1,...,k-2 and clist1(k-2) != clist2(k-2). But there is error in the code where i have put **. How can i fix this? The logic is that this function generates candidateItemsets which will be used again as an input to generate other candidate Itemsets.


回答1:


The add method in ArrayList takes a maximum of two arguments and you are passing in three. If you wish to add all three items, call add(Integer i) three times.

Also, if you want to return candidateItemsets from the function you must declare an ArrayList<Integer> return value and create the list:

public static ArrayList<Integer> candidateItemsetGen(ArrayList<Integer> clist1, ArrayList<Integer> clist2) {
  ArrayList<Integer> candidateItemset = new ArrayList<Integer>();

  for (int i = 0; i < clist1.size(); i++) {
    for (int j = i + 1; j < clist2.size(); j++) {
      for (int k = 0; k < clist1.size() - 2; k++) {
        int r = clist1.get(k).compareTo(clist2.get(k));
        if(r == 0 && clist1.get(k) - 1 == clist2.get(k) - 1) {
          candidateItemset.add(clist1.get(i));
          candidateItemset.add(clist1.get(clist1.size() - 1));
          candidateItemset.add(clist2.get(clist2.size() - 1));   
        }        
      }
    }
  }

  return candidateItemset;
}

If you want to add all three as a group of related values, store them together in a separate data structure and add that to candidateItemset (of correct type).




回答2:


You could optimize that code further if you consider that each list of itemsets are sorted according to the lexical order.

For example, let's say that

clist1 = AB, AD, AF, AG, BC, FG

clist2 = BD, FE, FG, FH, FI

With your code, you will compare AB with all the itemsets of clist2.

But you could optimize that, by stoping right after BD because B is larger than A in AB according to the lexical order. Therefore, no itemsets after BD in Clist2 will match with AB.

If you want to see the code of an optimized implementation of Apriori, you can check my open source data mining library named SPMF



来源:https://stackoverflow.com/questions/17125742/creating-k-itemsets-from-2-itemsets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!