Preserving ORDER BY in SELECT INTO

前端 未结 8 657
南笙
南笙 2020-11-27 07:18

I have a T-SQL query that takes data from one table and copies it into a new table but only rows meeting a certain condition:

SELECT VibeFGEvents.* 
INTO Vib         


        
8条回答
  •  有刺的猬
    2020-11-27 08:12

    I've found a specific scenario where we want the new table to be created with a specific order in the columns' content:

    • Amount of rows is very big (from 200 to 2000 millions of rows), so we are using SELECT INTO instead of CREATE TABLE + INSERT because needs to be loaded as fast as possible (minimal logging). We have tested using the trace flag 610 for loading an already created empty table with a clustered index but still takes longer than the following approach.
    • We need the data to be ordered by specific columns for query performances, so we are creating a CLUSTERED INDEX just after the table is loaded. We discarded creating a non-clustered index because it would need another read for the data that's not included in the ordered columns from the index, and we discarded creating a full-covering non-clustered index because it would practically double the amount of space needed to hold the table.

    It happens that if you manage to somehow create the table with columns already "ordered", creating the clustered index (with the same order) takes a lot less time than when the data isn't ordered. And sometimes (you will have to test your case), ordering the rows in the SELECT INTO is faster than loading without order and creating the clustered index later.

    The problem is that SQL Server 2012+ will ignore the ORDER BY column list when doing INSERT INTO or when doing SELECT INTO. It will consider the ORDER BY columns if you specify an IDENTITY column on the SELECT INTO or if the inserted table has an IDENTITY column, but just to determine the identity values and not the actual storage order in the underlying table. In this case, it's likely that the sort will happen but not guaranteed as it's highly dependent on the execution plan.

    A trick we have found is that doing a SELECT INTO with the result of a UNION ALL makes the engine perform a SORT (not always an explicit SORT operator, sometimes a MERGE JOIN CONCATENATION, etc.) if you have an ORDER BY list. This way the select into already creates the new table in the order we are going to create the clustered index later and thus the index takes less time to create.

    So you can rewrite this query:

    SELECT
        FirstColumn = T.FirstColumn,
        SecondColumn = T.SecondColumn
    INTO
        #NewTable
    FROM
        VeryBigTable AS T
    ORDER BY            -- ORDER BY is ignored!
        FirstColumn,
        SecondColumn
    

    to

    SELECT
        FirstColumn = T.FirstColumn,
        SecondColumn = T.SecondColumn
    INTO
        #NewTable
    FROM
        VeryBigTable AS T
    
    UNION ALL
    
    -- A "fake" row to be deleted
    SELECT
        FirstColumn = 0,
        SecondColumn = 0
    
    ORDER BY
        FirstColumn,
        SecondColumn
    

    We have used this trick a few times, but I can't guarantee it will always sort. I'm just posting this as a possible workaround in case someone has a similar scenario.

提交回复
热议问题