I have a T-SQL query that takes data from one table and copies it into a new table but only rows meeting a certain condition:
SELECT VibeFGEvents.*
INTO Vib
I've found a specific scenario where we want the new table to be created with a specific order in the columns' content:
SELECT INTO
instead of CREATE TABLE + INSERT
because needs to be loaded as fast as possible (minimal logging). We have tested using the trace flag 610 for loading an already created empty table with a clustered index but still takes longer than the following approach.CLUSTERED INDEX
just after the table is loaded. We discarded creating a non-clustered index because it would need another read for the data that's not included in the ordered columns from the index, and we discarded creating a full-covering non-clustered index because it would practically double the amount of space needed to hold the table.It happens that if you manage to somehow create the table with columns already "ordered", creating the clustered index (with the same order) takes a lot less time than when the data isn't ordered. And sometimes (you will have to test your case), ordering the rows in the SELECT INTO
is faster than loading without order and creating the clustered index later.
The problem is that SQL Server 2012+ will ignore the ORDER BY
column list when doing INSERT INTO
or when doing SELECT INTO
. It will consider the ORDER BY
columns if you specify an IDENTITY
column on the SELECT INTO
or if the inserted table has an IDENTITY
column, but just to determine the identity values and not the actual storage order in the underlying table. In this case, it's likely that the sort will happen but not guaranteed as it's highly dependent on the execution plan.
A trick we have found is that doing a SELECT INTO
with the result of a UNION ALL
makes the engine perform a SORT
(not always an explicit SORT
operator, sometimes a MERGE JOIN CONCATENATION
, etc.) if you have an ORDER BY
list. This way the select into already creates the new table in the order we are going to create the clustered index later and thus the index takes less time to create.
So you can rewrite this query:
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
ORDER BY -- ORDER BY is ignored!
FirstColumn,
SecondColumn
to
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
UNION ALL
-- A "fake" row to be deleted
SELECT
FirstColumn = 0,
SecondColumn = 0
ORDER BY
FirstColumn,
SecondColumn
We have used this trick a few times, but I can't guarantee it will always sort. I'm just posting this as a possible workaround in case someone has a similar scenario.