Group by every N records in T-SQL

后端未结

关注

 4  411

I have some performance test results on the database, and what I want to do is to group every 1000 records (previously sorted in ascending order by date) an

相关标签:

4条回答

深忆病人

2021-01-03 23:27
Give the answer to Yuck. I only post as an answer so I could include a code block. I did a count test to see if it was grouping by 1000 and the first set was 999. This produced set sizes of 1,000. Great query Yuck.
```
    WITH T AS (
    SELECT RANK() OVER (ORDER BY sID) Rank, sID 
    FROM docSVsys
    )
    SELECT (Rank-1) / 1000 GroupID, count(sID)
    FROM T
    GROUP BY ((Rank-1) / 1000)
    order by GroupID 
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2021-01-03 23:31
I +1'd @Yuck, because I think that is a good answer. But it's worth mentioning NTILE().

Reason being, if you have 10,010 records (for example), then you'll have 11 groupings -- the first 10 with 1000 in them, and the last with just 10.

If you're comparing averages between each group of 1000, then you should either discard the last group as it's not a representative group, or...you could make all the groups the same size.

NTILE() would make all groups the same size; the only caveat is that you'd need to know how many groups you wanted.

So if your table had 25,250 records, you'd use NTILE(25), and your groupings would be approximately 1000 in size -- they'd actually be 1010 in size; the benefit being, they'd all be the same size, which might make them more relevant to each other in terms of whatever comparison analysis you're doing.

You could get your group-size simply by
```
DECLARE @ntile int
SET  @ntile = (SELECT count(1) from myTable) / 1000
```
And then modifying @Yuck's approach with the NTILE() substitution:
```
;WITH myCTE AS (
  SELECT NTILE(@ntile) OVER (ORDER BY id) myGroup,
    col1, col2, ...
  FROM dbo.myTable
)
SELECT myGroup, col1, col2...
FROM myCTE
GROUP BY (myGroup), col1, col2...
;
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2021-01-03 23:34
```
WITH T AS (
  SELECT RANK() OVER (ORDER BY ID) Rank,
    P.Field1, P.Field2, P.Value1, ...
  FROM P
)
SELECT (Rank - 1) / 1000 GroupID, AVG(...)
FROM T
GROUP BY ((Rank - 1) / 1000)
;
```
Something like that should get you started. If you can provide your actual schema I can update as appropriate.
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2021-01-03 23:48
Answer above does not actually assign a unique group id to each 1000 records. Adding Floor() is needed. The following will return all records from your table, with a unique GroupID for each 1000 rows:
```
WITH T AS (
  SELECT RANK() OVER (ORDER BY your_field) Rank,
    your_field
  FROM your_table
  WHERE your_field = 'your_criteria'
)
SELECT Floor((Rank-1) / 1000) GroupID, your_field
FROM T
```
And for my needs, I wanted my GroupID to be a random set of characters, so I changed the Floor(...) GroupID to:
```
TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 10) AS STRING),'seed1'))) GroupID
```
without the seed value, you and I would get the exact same output because we're just doing a SHA256 on the number 1, 2, 3 etc. But adding the seed makes the output unique, but still repeatable.

This is BigQuery syntax. T-SQL might be slightly different.

Lastly, if you want to leave off the last chunk that is not a full 1000, you can find it by doing:
```
WITH T AS (
  SELECT RANK() OVER (ORDER BY your_field) Rank,
    your_field
  FROM your_table
  WHERE your_field = 'your_criteria'
)
SELECT Floor((Rank-1) / 1000) GroupID, your_field
, COUNT(*) OVER(PARTITION BY TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 1000) AS STRING),'seed1')))) AS CountInGroup
FROM T
ORDER BY CountInGroup
```
0 讨论(0)
发布评论:

提交评论
- 加载中...