I\'ve seen many different ways to create and populate a numbers table. However, what is the best way to create and populate one? With \"best\" being defined from most to l
Some of the suggested methods are basing on system objects (for example on the 'sys.objects'). They are assuming these system objects contain enough records to generate our numbers.
I would not base on anything which does not belong to my application and over which I do not have full control. For example: the content of these sys tables may change, the tables may not be valid anymore in new version of SQL etc.
As a solution, we can create our own table with records. We then use that one instead these system related objects (table with all numbers should be fine if we know the range in advance otherwise we could go for the one to do the cross join on).
The CTE based solution is working fine but it has limits related to the nested loops.
This is a repackaging of the accepted answer - but in a way that lets you compare them all to each other for yourself - the top 3 algorithms are compared (and comments explain why other methods are excluded) and you can run against your own setup to see how they each perform with the size of sequence that you desire.
SET NOCOUNT ON;
--
-- Set the count of numbers that you want in your sequence ...
--
DECLARE @NumberOfNumbers int = 10000000;
--
-- Some notes on choosing a useful length for your sequence ...
-- For a sequence of 100 numbers -- winner depends on preference of min/max/avg runtime ... (I prefer PhilKelley algo here - edit the algo so RowSet2 is max RowSet CTE)
-- For a sequence of 1k numbers -- winner depends on preference of min/max/avg runtime ... (Sadly PhilKelley algo is generally lowest ranked in this bucket, but could be tweaked to perform better)
-- For a sequence of 10k numbers -- a clear winner emerges for this bucket
-- For a sequence of 100k numbers -- do not test any looping methods at this size or above ...
-- the previous winner fails, a different method is need to guarantee the full sequence desired
-- For a sequence of 1MM numbers -- the statistics aren't changing much between the algorithms - choose one based on your own goals or tweaks
-- For a sequence of 10MM numbers -- only one of the methods yields the desired sequence, and the numbers are much closer than for smaller sequences
DECLARE @TestIteration int = 0;
DECLARE @MaxIterations int = 10;
DECLARE @MethodName varchar(128);
-- SQL SERVER 2017 Syntax/Support needed
DROP TABLE IF EXISTS #TimingTest
CREATE TABLE #TimingTest (MethodName varchar(128), TestIteration int, StartDate DateTime2, EndDate DateTime2, ElapsedTime decimal(38,0), ItemCount decimal(38,0), MaxNumber decimal(38,0), MinNumber decimal(38,0))
--
-- Conduct the test ...
--
WHILE @TestIteration < @MaxIterations
BEGIN
-- Be sure that the test moves forward
SET @TestIteration += 1;
/* -- This method has been removed, as it is BY FAR, the slowest method
-- This test shows that, looping should be avoided, likely at all costs, if one places a value / premium on speed of execution ...
--
-- METHOD - Fast looping
--
-- Prep for the test
DROP TABLE IF EXISTS [Numbers].[Test];
CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);
-- Method information
SET @MethodName = 'FastLoop';
-- Record the start of the test
INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
SELECT @MethodName, @TestIteration, GETDATE()
-- Run the algorithm
DECLARE @i INT = 1;
WHILE @i <= @NumberOfNumbers
BEGIN
INSERT INTO [Numbers].[Test](Number) VALUES (@i);
SELECT @i = @i + 1;
END;
ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)
-- Record the end of the test
UPDATE tt
SET
EndDate = GETDATE()
FROM #TimingTest tt
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
-- And the stats about the numbers in the sequence
UPDATE tt
SET
ItemCount = results.ItemCount,
MaxNumber = results.MaxNumber,
MinNumber = results.MinNumber
FROM #TimingTest tt
CROSS JOIN (
SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
) results
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
*/
/* -- This method requires GO statements, which would break the script, also - this answer does not appear to be the fastest *AND* seems to perform "magic"
--
-- METHOD - "Semi-Looping"
--
-- Prep for the test
DROP TABLE IF EXISTS [Numbers].[Test];
CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);
-- Method information
SET @MethodName = 'SemiLoop';
-- Record the start of the test
INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
SELECT @MethodName, @TestIteration, GETDATE()
-- Run the algorithm
INSERT [Numbers].[Test] values (1);
-- GO --required
INSERT [Numbers].[Test] SELECT Number + (SELECT COUNT(*) FROM [Numbers].[Test]) FROM [Numbers].[Test]
-- GO 14 --will create 16384 total rows
ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)
-- Record the end of the test
UPDATE tt
SET
EndDate = GETDATE()
FROM #TimingTest tt
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
-- And the stats about the numbers in the sequence
UPDATE tt
SET
ItemCount = results.ItemCount,
MaxNumber = results.MaxNumber,
MinNumber = results.MinNumber
FROM #TimingTest tt
CROSS JOIN (
SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
) results
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
*/
--
-- METHOD - Philip Kelley's algo
-- (needs tweaking to match the desired length of sequence in order to optimize its performance, relies more on the coder to properly tweak the algorithm)
--
-- Prep for the test
DROP TABLE IF EXISTS [Numbers].[Test];
CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);
-- Method information
SET @MethodName = 'PhilKelley';
-- Record the start of the test
INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
SELECT @MethodName, @TestIteration, GETDATE()
-- Run the algorithm
; WITH
RowSet0 as (select 1 as Item union all select 1), -- 2 rows -- We only have to name the column in the first select, the second/union select inherits the column name
RowSet1 as (select 1 as Item from RowSet0 as A, RowSet0 as B), -- 4 rows
RowSet2 as (select 1 as Item from RowSet1 as A, RowSet1 as B), -- 16 rows
RowSet3 as (select 1 as Item from RowSet2 as A, RowSet2 as B), -- 256 rows
RowSet4 as (select 1 as Item from RowSet3 as A, RowSet3 as B), -- 65536 rows (65k)
RowSet5 as (select 1 as Item from RowSet4 as A, RowSet4 as B), -- 4294967296 rows (4BB)
-- Add more RowSetX to get higher and higher numbers of rows
-- Each successive RowSetX results in squaring the previously available number of rows
Tally as (select row_number() over (order by Item) as Number from RowSet5) -- This is what gives us the sequence of integers, always select from the terminal CTE expression
-- Note: testing of this specific use case has shown that making Tally as a sub-query instead of a terminal CTE expression is slower (always) - be sure to follow this pattern closely for max performance
INSERT INTO [Numbers].[Test] (Number)
SELECT o.Number
FROM Tally o
WHERE o.Number <= @NumberOfNumbers
ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)
-- Record the end of the test
UPDATE tt
SET
EndDate = GETDATE()
FROM #TimingTest tt
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
-- And the stats about the numbers in the sequence
UPDATE tt
SET
ItemCount = results.ItemCount,
MaxNumber = results.MaxNumber,
MinNumber = results.MinNumber
FROM #TimingTest tt
CROSS JOIN (
SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
) results
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
--
-- METHOD - Mladen Prajdic answer
--
-- Prep for the test
DROP TABLE IF EXISTS [Numbers].[Test];
CREATE TABLE [Numbers].[Test] (Number INT NOT NULL);
-- Method information
SET @MethodName = 'MladenPrajdic';
-- Record the start of the test
INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
SELECT @MethodName, @TestIteration, GETDATE()
-- Run the algorithm
INSERT INTO [Numbers].[Test](Number)
SELECT TOP (@NumberOfNumbers) row_number() over(order by t1.number) as N
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)
-- Record the end of the test
UPDATE tt
SET
EndDate = GETDATE()
FROM #TimingTest tt
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
-- And the stats about the numbers in the sequence
UPDATE tt
SET
ItemCount = results.ItemCount,
MaxNumber = results.MaxNumber,
MinNumber = results.MinNumber
FROM #TimingTest tt
CROSS JOIN (
SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
) results
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
--
-- METHOD - Single INSERT
--
-- Prep for the test
DROP TABLE IF EXISTS [Numbers].[Test];
-- The Table creation is part of this algorithm ...
-- Method information
SET @MethodName = 'SingleInsert';
-- Record the start of the test
INSERT INTO #TimingTest(MethodName, TestIteration, StartDate)
SELECT @MethodName, @TestIteration, GETDATE()
-- Run the algorithm
SELECT TOP (@NumberOfNumbers) IDENTITY(int,1,1) AS Number
INTO [Numbers].[Test]
FROM sys.objects s1 -- use sys.columns if you don't get enough rows returned to generate all the numbers you need
CROSS JOIN sys.objects s2 -- use sys.columns if you don't get enough rows returned to generate all the numbers you need
ALTER TABLE [Numbers].[Test] ADD CONSTRAINT PK_Numbers_Test_Number PRIMARY KEY CLUSTERED (Number)
-- Record the end of the test
UPDATE tt
SET
EndDate = GETDATE()
FROM #TimingTest tt
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
-- And the stats about the numbers in the sequence
UPDATE tt
SET
ItemCount = results.ItemCount,
MaxNumber = results.MaxNumber,
MinNumber = results.MinNumber
FROM #TimingTest tt
CROSS JOIN (
SELECT COUNT(Number) as ItemCount, MAX(Number) as MaxNumber, MIN(Number) as MinNumber FROM [Numbers].[Test]
) results
WHERE tt.MethodName = @MethodName
and tt.TestIteration = @TestIteration
END
-- Calculate the timespan for each of the runs
UPDATE tt
SET
ElapsedTime = DATEDIFF(MICROSECOND, StartDate, EndDate)
FROM #TimingTest tt
--
-- Report the results ...
--
SELECT
MethodName, AVG(ElapsedTime) / AVG(ItemCount) as TimePerRecord, CAST(AVG(ItemCount) as bigint) as SequenceLength,
MAX(ElapsedTime) as MaxTime, MIN(ElapsedTime) as MinTime,
MAX(MaxNumber) as MaxNumber, MIN(MinNumber) as MinNumber
FROM #TimingTest tt
GROUP by tt.MethodName
ORDER BY TimePerRecord ASC, MaxTime ASC, MinTime ASC
Here is a short and fast in-memory solution that I came up with utilizing the Table Valued Constructors introduced in SQL Server 2008:
It will return 1,000,000 rows, however you can either add/remove CROSS JOINs, or use TOP clause to modify this.
;WITH v AS (SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) v(z))
SELECT N FROM (SELECT ROW_NUMBER() OVER (ORDER BY v1.z)-1 N FROM v v1
CROSS JOIN v v2 CROSS JOIN v v3 CROSS JOIN v v4 CROSS JOIN v v5 CROSS JOIN v v6) Nums
Note that this could be quickly calculated on the fly, or (even better) stored in a permanent table (just add an INTO
clause after the SELECT N
segment) with a primary key on the N
field for improved efficiency.
here are some code examples taken from the web and from answers to this question.
For Each Method, I have modified the original code so each use the same table and column: NumbersTest and Number, with 10,000 rows or as close to that as possible. Also, I have provided links to the place of origin.
METHOD 1 here is a very slow looping method from here
avg 13.01 seconds
ran 3 times removed highest, here are times in seconds: 12.42, 13.60
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest(Number INT IDENTITY(1,1))
SET NOCOUNT ON
WHILE COALESCE(SCOPE_IDENTITY(), 0) < 100000
BEGIN
INSERT dbo.NumbersTest DEFAULT VALUES
END
SET NOCOUNT OFF
-- Add a primary key/clustered index to the numbers table
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest
METHOD 2 here is a much faster looping one from here
avg 1.1658 seconds
ran 11 times removed highest, here are times in seconds: 1.117, 1.140, 1.203, 1.170, 1.173, 1.156, 1.203, 1.153, 1.173, 1.170
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number INT NOT NULL);
DECLARE @i INT;
SELECT @i = 1;
SET NOCOUNT ON
WHILE @i <= 10000
BEGIN
INSERT INTO dbo.NumbersTest(Number) VALUES (@i);
SELECT @i = @i + 1;
END;
SET NOCOUNT OFF
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE())/1000.0)+' seconds'
SELECT COUNT(*) FROM NumbersTest
METHOD 3 Here is a single INSERT based on code from here
avg 488.6 milliseconds
ran 11 times removed highest, here are times in milliseconds: 686, 673, 623, 686,343,343,376,360,343,453
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number int not null)
;WITH Nums(Number) AS
(SELECT 1 AS Number
UNION ALL
SELECT Number+1 FROM Nums where Number<10000
)
insert into NumbersTest(Number)
select Number from Nums option(maxrecursion 10000)
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
METHOD 4 here is a "semi-looping" method from here
avg 348.3 milliseconds (it was hard to get good timing because of the "GO" in the middle of the code, any suggestions would be appreciated)
ran 11 times removed highest, here are times in milliseconds: 356, 360, 283, 346, 360, 376, 326, 373, 330, 373
DROP TABLE NumbersTest
DROP TABLE #RunDate
CREATE TABLE #RunDate (RunDate datetime)
INSERT INTO #RunDate VALUES(GETDATE())
CREATE TABLE NumbersTest (Number int NOT NULL);
INSERT NumbersTest values (1);
GO --required
INSERT NumbersTest SELECT Number + (SELECT COUNT(*) FROM NumbersTest) FROM NumbersTest
GO 14 --will create 16384 total rows
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
SELECT CONVERT(varchar(20),datediff(ms,RunDate,GETDATE()))+' milliseconds' FROM #RunDate
SELECT COUNT(*) FROM NumbersTest
METHOD 5 here is a single INSERT from Philip Kelley's answer
avg 92.7 milliseconds
ran 11 times removed highest, here are times in milliseconds: 80, 96, 96, 93, 110, 110, 80, 76, 93, 93
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number int not null)
;WITH
Pass0 as (select 1 as C union all select 1), --2 rows
Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows
Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows
Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows
Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65536 rows
--I removed Pass5, since I'm only populating the Numbers table to 10,000
Tally as (select row_number() over(order by C) as Number from Pass4)
INSERT NumbersTest
(Number)
SELECT Number
FROM Tally
WHERE Number <= 10000
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
METHOD 6 here is a single INSERT from Mladen Prajdic answer
avg 82.3 milliseconds
ran 11 times removed highest, here are times in milliseconds: 80, 80, 93, 76, 93, 63, 93, 76, 93, 76
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
CREATE TABLE NumbersTest (Number int not null)
INSERT INTO NumbersTest(Number)
SELECT TOP 10000 row_number() over(order by t1.number) as N
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number);
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
METHOD 7 here is a single INSERT based on the code from here
avg 56.3 milliseconds
ran 11 times removed highest, here are times in milliseconds: 63, 50, 63, 46, 60, 63, 63, 46, 63, 46
DROP TABLE NumbersTest
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO NumbersTest
FROM sys.objects s1 --use sys.columns if you don't get enough rows returned to generate all the numbers you need
CROSS JOIN sys.objects s2 --use sys.columns if you don't get enough rows returned to generate all the numbers you need
ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
SELECT COUNT(*) FROM NumbersTest
After looking at all these methods, I really like Method 7, which was the fastest and the code is fairly simple too.
If you're just doing this in either SQL Server Management Studio or sqlcmd.exe
, you can use the fact that the batch separator allows you to repeat the batch:
CREATE TABLE Number (N INT IDENTITY(1,1) PRIMARY KEY NOT NULL);
GO
INSERT INTO Number DEFAULT VALUES;
GO 100000
This will insert 100000 records into the Numbers
table using the default value of the next identity.
It's slow. It compares to METHOD 1 in @KM.'s answer, which is the slowest of the examples. However, it's about as code light as it gets. You could speed it up somewhat by adding the primary key constraint after the insert batch.
I know this thread is old and answered, but there is a way to squeeze a little extra performance out of Method 7:
Instead of this (essentially method 7 but with some ease of use polish):
DECLARE @BIT AS BIT = 0
IF OBJECT_ID('tempdb..#TALLY') IS NOT NULL
DROP TABLE #TALLY
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO #TALLY
FROM sys.objects s1 --use sys.columns if you don't get enough rows returned to generate all the numbers you need
CROSS JOIN sys.objects s2 --use sys.co
ALTER TABLE #TALLY ADD PRIMARY KEY(Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
Try this:
DECLARE @BIT AS BIT = 0
IF OBJECT_ID('tempdb..#TALLY') IS NOT NULL
DROP TABLE #TALLY
DECLARE @RunDate datetime
SET @RunDate=GETDATE()
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO #TALLY
FROM (SELECT @BIT [X] UNION ALL SELECT @BIT) [T2]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T4]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T8]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T16]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T32]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T64]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T128]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T256]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T512]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T1024]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T2048]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T4096]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T8192]
CROSS JOIN (SELECT @BIT [X] UNION ALL SELECT @BIT) [T16384]
ALTER TABLE #TALLY ADD PRIMARY KEY(Number)
PRINT CONVERT(varchar(20),datediff(ms,@RunDate,GETDATE()))+' milliseconds'
On my server this takes ~10 ms as opposed to the ~16-20 ms when selecting from sys.objects. It also has the added benefit of not being dependent on how many objects are in sys.objects. While it's pretty safe, it's technically a dependency and the other one goes faster anyway. I think the speed boost is down to using BITs if you change:
DECLARE @BIT AS BIT = 0
to:
DECLARE @BIT AS BIGINT = 0
It adds ~8-10 ms to the total time on my server. That said, when you scale up to 1,000,000 records BIT vs BIGINT doesn't appreciably affect my query anymore, but it still runs around ~680ms vs ~730ms from sys.objects.