Is it possible to run apriori association rule in mysql statement?

久未见 提交于 2019-12-04 15:14:35

Yes, you can use SQL to find the support of a single item. But if you want to find itemsets containing more than one item, it would be difficult.

For example, if you had transactions containing several items and you want to find the support of "jam" with "milk" and "bread" appearing together, then it is better to use an algorithm like Apriori, or a faster algorithm like FPGrowth.

I arrive at 66% for the sample data you gave? There are 3 transactions for "butter", and out of those only 2 included "jam".

I used the following test table.

create table transactions(
   trans_no     varchar(5)  not null
  ,item         varchar(20) not null
  ,primary key(trans_no, item)
);

insert into transactions(trans_no, item)
values ('T1', 'butter')
      ,('T1', 'jam')
      ,('T2', 'butter')
      ,('T3', 'bread')
      ,('T3', 'ice cream')
      ,('T4', 'butter')
      ,('T4', 'jam');

The following is my attempt at an answer. The inner select find all transactions that included "butter". For each such transaction, it also sets a flag (bought_jam) saying whether that transaction also included "jam". (The having clause excludes transactions that includes "jam" but not "butter").
In the outer select, I basically count all rows (the count corresponds to the number of transactions including butter), and sums the jam flag, which corresponds to the number of transactions including both butter and jam.

select sum(bought_jam) as jams_bought
      ,count(*) as num_trans
      ,100 * sum(bought_jam) / count(*) as correlation_pct
  from (select trans_no
              ,max(case when item = 'jam' then 1 else 0 end) as bought_jam
          from transactions
         where item in('butter', 'jam')
         group 
            by trans_no
        having min(case when item = 'butter' then item end) = 'butter'
       ) butter_trans;

The query above gives the following result:

+-------------+-----------+-----------------+
| jams_bought | num_trans | correlation_pct |
+-------------+-----------+-----------------+
|           2 |         3 |         66.6667 |
+-------------+-----------+-----------------+
1 row in set (0.00 sec)

Let me know how this works out for you.

Edit:
The following query would give the same results, but is much easier to read. However, if the transactions table is very large, and the item = x isn't very selective (returns lots of rows), this query would almost certainly be slower.

select count(t2.trans_no) as jams_bought
      ,count(*) as num_trans
      ,count(t2.trans_no) / count(*) as correlation_pct
  from transactions t1
  left join transactions t2 on(t2.trans_no = t1.trans_no and t2.item = 'jam')
 where t1.item = 'butter';
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!