I have a query which is something like this
SELECT
t.category,
tc.product,
tc.sub-product,
count(*) as sales
FROM tg t, ttc tc
WHERE t.value = tc.valu
There are probably reasons not to use analytical functions, but using analytical functions alone:
select am, rf, rfm, rownum_rf2, rownum_rfm
from
(
-- the 3nd level takes the subproduct ranks, and for each equally ranked
-- subproduct, it produces the product ranking
select am, rf, rfm, rownum_rfm,
row_number() over (partition by rownum_rfm order by rownum_rf) rownum_rf2
from
(
-- the 2nd level ranks (without ties) the products within
-- categories, and subproducts within products simultaneosly
select am, rf, rfm,
row_number() over (partition by am order by count_rf desc) rownum_rf,
row_number() over (partition by am, rf order by count_rfm desc) rownum_rfm
from
(
-- inner most query counts the records by subproduct
-- using regular group-by. at the same time, it uses
-- the analytical sum() over to get the counts by product
select tg.am, ttc.rf, ttc.rfm,
count(*) count_rfm,
sum(count(*)) over (partition by tg.am, ttc.rf) count_rf
from tg inner join ttc on tg.value = ttc.value
group by tg.am, ttc.rf, ttc.rfm
) X
) Y
-- at level 3, we drop all but the top 5 subproducts per product
where rownum_rfm <= 5 -- top 5 subproducts
) Z
-- the filter on the final query retains only the top 10 products
where rownum_rf2 <= 10 -- top 10 products
order by am, rownum_rf2, rownum_rfm;
I used rownum instead of rank so you don't ever get ties, or in other words, ties will be randomly decided. This also doesn't work if the data is not dense enough (less than 5 subproducts in any of the top 10 products - it may show subproducts from some other products instead). But if the data is dense (large established database), the query should work fine.
select am, rf, rfm, count_rf, count_rfm, rownum_rf, rownum_rfm
from
(
-- next join the top 10 products to the data again to get
-- the subproduct counts
select tg.am, tg.rf, ttc.rfm, tg.count_rf, tg.rownum_rf, count(*) count_rfm,
ROW_NUMBER() over (partition by tg.am, tg.rf order by 1 desc) rownum_rfm
from (
-- first rank all the products
select tg.am, tg.value, ttc.rf, count(*) count_rf,
ROW_NUMBER() over (order by 1 desc) rownum_rf
from tg
inner join ttc on tg.value = ttc.value
group by tg.am, tg.value, ttc.rf
order by count_rf desc
) tg
inner join ttc on tg.value = ttc.value and tg.rf = ttc.rf
-- filter the inner query for the top 10 products only
where rownum_rf <= 10
group by tg.am, tg.rf, ttc.rfm, tg.count_rf, tg.rownum_rf
) X
-- filter where the subproduct rank is in top 5
where rownum_rfm <= 5
order by am, rownum_rf, rownum_rfm;
columns:
count_rf : count of sales by product
count_rfm : count of sales by subproduct
rownum_rf : product rank within category (rownumber - without ties)
rownum_rfm : subproduct rank within product (without ties)
It's guesswork, but you could probably start from something like this:
drop table category_sales;
Some test data:
create table category_sales (
category varchar2(14),
product varchar2(14),
subproduct varchar2(14),
sales number
);
begin
for cate in 1 .. 10 loop
for prod in 1 .. 20 loop
for subp in 1 .. 30 loop
insert into category_sales values (
'Cat ' || cate,
'Prod ' || cate||prod,
'Subp ' || cate||prod||subp,
trunc(dbms_random.value(1,30 + cate - prod + subp))
);
end loop; end loop; end loop;
end;
/
The actual query:
select * from (
select
category,
product,
subproduct,
sales,
category_sales,
product_sales,
top_subproduct,
-- Finding best products within category:
dense_rank () over (
partition by category
order by product_sales desc
) top_product
from (
select
-- Finding the best Subproducts within
-- category and product:
dense_rank () over (
partition by category,
product
order by sales desc
) top_subproduct,
-- Finding the sum(sales) within a
-- category and prodcut
sum(sales) over (
partition by category,
product
) product_sales,
-- Finding the sum(sales) within
-- category
sum(sales) over (
partition by category
) category_sales,
category,
product,
subproduct,
sales
from
category_sales
)
)
where
-- Only best 10 Products
top_product <= 10 and
-- Only best 5 subproducts:
top_subproduct <= 5
-- "Best" categories first:
order by
category_sales desc,
top_product desc,
top_subproduct desc;
In that query, the column category_sales
returns the sum of sales of the category in whose record it is returned. That means, every record of the same category has the same category_sales
. This column is needed to order the result set with the best (sales) categories first (order by ... category_sales desc
).
Similarly, product_sales
is the sum of sales for a category-product combination. This column is used to find the best n (here:10) products in each category (where top_product <= 10
).
The column top_product
is "created" with the dense_rank() over...
analytical function. For the best product in a category,it's 1, for the second best it's 2 and so on (hence the where top_product <= 10
.
The columntop_suproduct
is calculated in a similar fashion like top_product
(that is with dense_rank
).