问题
I have a table with 3 columns as shown below:
id | num_rows id | num_rows | group_id
-----|--------- -----|----------|--------
2502 | 330 2502 | 330 | 9
3972 | 150 3972 | 150 | 9
3988 | 200 =============> 3988 | 200 | 8
4228 | 280 Desired output 4228 | 280 | 8
3971 | 510 =============> 3971 | 510 | 1
52 | 1990 52 | 1990 | 2
895 | 2000 895 | 2000 | 3
812 | 5596 812 | 5596 | 4
1600 | 7462 1600 | 7462 | 5
910 | 7526 910 | 7526 | 6
638 | 11569 638 | 11569 | 7
id
is a unique identifier for something while num_rows
correspond to the number of rows each id
has in another table.
I would like to group the rows (i.e., id
column) such that the sum of num_rows
is never above a specified value (in this case lets say 500
).
Simply put: I want to group the id
's in buckets with no bucket have rows more than 500
. If an id
is bigger than the limit then it gets its own separate group/bucket.
So far, I have been able to separate out the larger id
's using the following query but I am not able to create groups for the remaining subset of the id
's.
SELECT id,
num_rows,
SUM(CASE WHEN num_rows > 500 THEN 1 ELSE 0 END) OVER(ORDER BY num_rows) AS group_id
FROM myTable;
id | num_rows | group_id
-----|----------|--------
2502 | 330 | 0
3972 | 150 | 0
3988 | 200 | 0
4228 | 280 | 0
3971 | 510 | 1
52 | 1990 | 2
895 | 2000 | 3
812 | 5596 | 4
1600 | 7462 | 5
910 | 7526 | 6
638 | 11569 | 7
Thank you.
回答1:
I personally would prefere a pl/sql function for this task, but if you want to do it in pure sql you can use the following query:
WITH ord AS (SELECT id, num_rows, ROWNUM ord FROM myTable)
, rek(ord, id, num_rows, sum_rows, groupId) AS
(SELECT ord, id, num_rows, num_rows, 1 FROM ord WHERE ord = 1
UNION ALL
SELECT rek.ord +1
, ord.id
, ord.num_rows
, CASE WHEN rek.sum_rows + ord.num_rows > 500
THEN ord.num_rows
ELSE rek.num_rows + ord.num_rows
END
, CASE WHEN rek.sum_rows + ord.num_rows > 500
THEN rek.groupID + 1
ELSE rek.groupID
END
FROM rek
JOIN ORD
ON ord.ord = rek.ord+1)
SELECT id, num_rows, groupid
FROM rek
/
Note that this query does not search for matching entries to build groups such that the sum is < 500 as this is closely related to the so called knapsack problem (s. https://en.wikipedia.org/wiki/Knapsack_problem), which is all but easy to solve...
回答2:
If you don't want it by sequence, you can just make group using no of rows as follows
SELECT id,
num_rows,
ceil(num_rows/500) AS group_id
FROM myTable;
This should new id for each 500 rows blocks.
来源:https://stackoverflow.com/questions/52218969/group-rows-based-on-column-sum-value