Group by every N records in T-SQL

后端 未结 4 411
旧时难觅i
旧时难觅i 2021-01-03 23:26

I have some performance test results on the database, and what I want to do is to group every 1000 records (previously sorted in ascending order by date) an

相关标签:
4条回答
  • 2021-01-03 23:27

    Give the answer to Yuck. I only post as an answer so I could include a code block. I did a count test to see if it was grouping by 1000 and the first set was 999. This produced set sizes of 1,000. Great query Yuck.

        WITH T AS (
        SELECT RANK() OVER (ORDER BY sID) Rank, sID 
        FROM docSVsys
        )
        SELECT (Rank-1) / 1000 GroupID, count(sID)
        FROM T
        GROUP BY ((Rank-1) / 1000)
        order by GroupID 
    
    0 讨论(0)
  • 2021-01-03 23:31

    I +1'd @Yuck, because I think that is a good answer. But it's worth mentioning NTILE().

    Reason being, if you have 10,010 records (for example), then you'll have 11 groupings -- the first 10 with 1000 in them, and the last with just 10.

    If you're comparing averages between each group of 1000, then you should either discard the last group as it's not a representative group, or...you could make all the groups the same size.

    NTILE() would make all groups the same size; the only caveat is that you'd need to know how many groups you wanted.

    So if your table had 25,250 records, you'd use NTILE(25), and your groupings would be approximately 1000 in size -- they'd actually be 1010 in size; the benefit being, they'd all be the same size, which might make them more relevant to each other in terms of whatever comparison analysis you're doing.

    You could get your group-size simply by

    DECLARE @ntile int
    SET  @ntile = (SELECT count(1) from myTable) / 1000
    

    And then modifying @Yuck's approach with the NTILE() substitution:

    ;WITH myCTE AS (
      SELECT NTILE(@ntile) OVER (ORDER BY id) myGroup,
        col1, col2, ...
      FROM dbo.myTable
    )
    SELECT myGroup, col1, col2...
    FROM myCTE
    GROUP BY (myGroup), col1, col2...
    ;
    
    0 讨论(0)
  • 2021-01-03 23:34
    WITH T AS (
      SELECT RANK() OVER (ORDER BY ID) Rank,
        P.Field1, P.Field2, P.Value1, ...
      FROM P
    )
    SELECT (Rank - 1) / 1000 GroupID, AVG(...)
    FROM T
    GROUP BY ((Rank - 1) / 1000)
    ;
    

    Something like that should get you started. If you can provide your actual schema I can update as appropriate.

    0 讨论(0)
  • 2021-01-03 23:48

    Answer above does not actually assign a unique group id to each 1000 records. Adding Floor() is needed. The following will return all records from your table, with a unique GroupID for each 1000 rows:

    WITH T AS (
      SELECT RANK() OVER (ORDER BY your_field) Rank,
        your_field
      FROM your_table
      WHERE your_field = 'your_criteria'
    )
    SELECT Floor((Rank-1) / 1000) GroupID, your_field
    FROM T
    

    And for my needs, I wanted my GroupID to be a random set of characters, so I changed the Floor(...) GroupID to:

    TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 10) AS STRING),'seed1'))) GroupID
    

    without the seed value, you and I would get the exact same output because we're just doing a SHA256 on the number 1, 2, 3 etc. But adding the seed makes the output unique, but still repeatable.

    This is BigQuery syntax. T-SQL might be slightly different.

    Lastly, if you want to leave off the last chunk that is not a full 1000, you can find it by doing:

    WITH T AS (
      SELECT RANK() OVER (ORDER BY your_field) Rank,
        your_field
      FROM your_table
      WHERE your_field = 'your_criteria'
    )
    SELECT Floor((Rank-1) / 1000) GroupID, your_field
    , COUNT(*) OVER(PARTITION BY TO_HEX(SHA256(CONCAT(CAST(Floor((Rank-1) / 1000) AS STRING),'seed1')))) AS CountInGroup
    FROM T
    ORDER BY CountInGroup
    
    0 讨论(0)
提交回复
热议问题