The most elegant way to generate permutations in SQL server

后端 未结 10 1343
面向向阳花
面向向阳花 2020-11-29 07:00

Given a the following table:

Index | Element
---------------
  1   |    A
  2   |    B
  3   |    C
  4   |    D

We want to generate all th

相关标签:
10条回答
  • 2020-11-29 07:11

    After making some perhaps snarky comments, this problem stuck in my brain all evening, and I eventually came up with the following set-based approach. I believe it definitely qualifies as "elegant", but then I also think it qualifies as "kinda dumb". You make the call.

    First, set up some tables:

    --  For testing purposes
    DROP TABLE Source
    DROP TABLE Numbers
    DROP TABLE Results
    
    
    --  Add as many rows as need be processed--though note that you get N! (number of rows, factorial) results,
    --  and that gets big fast. The Identity column must start at 1, or the algorithm will have to be adjusted.
    --  Element could be more than char(1), though the algorithm would have to be adjusted again, and each element
    --  must be the same length.
    CREATE TABLE Source
     (
       SourceId  int      not null  identity(1,1)
      ,Element   char(1)  not null
     )
    
    INSERT Source (Element) values ('A')
    INSERT Source (Element) values ('B')
    INSERT Source (Element) values ('C')
    INSERT Source (Element) values ('D')
    --INSERT Source (Element) values ('E')
    --INSERT Source (Element) values ('F')
    
    
    --  This is a standard Tally table (or "table of numbers")
    --  It only needs to be as long as there are elements in table Source
    CREATE TABLE Numbers (Number int not null)
    INSERT Numbers (Number) values (1)
    INSERT Numbers (Number) values (2)
    INSERT Numbers (Number) values (3)
    INSERT Numbers (Number) values (4)
    INSERT Numbers (Number) values (5)
    INSERT Numbers (Number) values (6)
    INSERT Numbers (Number) values (7)
    INSERT Numbers (Number) values (8)
    INSERT Numbers (Number) values (9)
    INSERT Numbers (Number) values (10)
    
    
    --  Results are iteratively built here. This could be a temp table. An index on "Length" might make runs
    --  faster for large sets.  Combo must be at least as long as there are characters to be permuted.
    CREATE TABLE Results
     (
       Combo   varchar(10)  not null
      ,Length  int          not null
     )
    

    Here's the routine:

    SET NOCOUNT on
    
    DECLARE
      @Loop     int
     ,@MaxLoop  int
    
    
    --  How many elements there are to process
    SELECT @MaxLoop = max(SourceId)
     from Source
    
    
    --  Initialize first value
    TRUNCATE TABLE Results
    INSERT Results (Combo, Length)
     select Element, 1
      from Source
      where SourceId = 1
    
    SET @Loop = 2
    
    --  Iterate to add each element after the first
    WHILE @Loop <= @MaxLoop
     BEGIN
    
        --  See comments below. Note that the "distinct" remove duplicates, if a given value
        --  is to be included more than once
        INSERT Results (Combo, Length)
         select distinct
            left(re.Combo, @Loop - nm.Number)
            + so.Element
            + right(re.Combo, nm.Number - 1)
           ,@Loop
          from Results re
           inner join Numbers nm
            on nm.Number <= @Loop
           inner join Source so
            on so.SourceId = @Loop
          where re.Length = @Loop - 1
    
        --  For performance, add this in if sets will be large
        --DELETE Results
        -- where Length <> @Loop
    
        SET @Loop = @Loop + 1
     END
    
    --  Show results
    SELECT *
     from Results
     where Length = @MaxLoop
     order by Combo
    

    The general idea is: when adding a new element (say "B") to any string (say, "A"), to catch all permutations you would add B to all possible positions (Ba, aB), resulting in a new set of strings. Then iterate: Add a new element (C) to each position in a string (AB becomes Cab, aCb, abC), for all strings (Cba, bCa, baC), and you have the set of permutations. Iterate over each result set with the next character until you run out of characters... or resources. 10 elements is 3.6 million permutations, roughly 48MB with the above algorithm, and 14 (unique) elements would hit 87 billion permutations and 1.163 terabytes.

    I'm sure it could eventually be wedged into a CTE, but in the end all that would be is a glorified loop. The logic is clearer this way, and I can't help but think the CTE execution plan would be a nightmare.

    0 讨论(0)
  • 2020-11-29 07:11

    Current solution using a recursive CTE.

    -- The base elements
    Declare @Number Table( Element varchar(MAX), Id varchar(MAX) )
    Insert Into @Number Values ( 'A', '01')
    Insert Into @Number Values ( 'B', '02')
    Insert Into @Number Values ( 'C', '03')
    Insert Into @Number Values ( 'D', '04')
    
    -- Number of elements
    Declare @ElementsNumber int
    Select  @ElementsNumber = COUNT(*)
    From    @Number;
    
    
    
    -- Permute!
    With Permutations(   Permutation,   -- The permutation generated
                         Ids,            -- Which elements where used in the permutation
                         Depth )         -- The permutation length
    As
    (
        Select  Element,
                Id + ';',
                Depth = 1
        From    @Number
        Union All
        Select  Permutation + ' ' + Element,
                Ids + Id + ';',
                Depth = Depth + 1
        From    Permutations,
                @Number
        Where   Depth < @ElementsNumber And -- Generate only the required permutation number
                Ids Not like '%' + Id + ';%' -- Do not repeat elements in the permutation (this is the reason why we need the 'Ids' column) 
    )
    Select  Permutation
    From    Permutations
    Where   Depth = @ElementsNumber
    
    0 讨论(0)
  • 2020-11-29 07:16

    Way too much rust on my SQL skills, but i took a different tack for a similar problem and thought it worth sharing.

    Table1 - X strings in a single field Uno
    Table2 - Y strings in a single field Dos
    
    (SELECT Uno, Dos
    FROM Table1
    CROSS JOIN Table2 ON 1=1)
        UNION
    (SELECT  Dos, Uno
    FROM Table1
    CROSS JOIN Table2 ON 1=1)
    

    Same principle for 3 tables with an added CROSS JOIN

    (SELECT  Tres, Uno, Dos
    FROM Table1
    CROSS JOIN Table2 ON 1=1
        CROSS JOIN Table3 ON 1=1)
    

    although it takes 6 cross-join sets in the union.

    0 讨论(0)
  • 2020-11-29 07:24

    This method uses a binary mask to select the correct rows:

    ;with src(t,n,p) as (
    select element, index, power(2,index-1)
    from table
    )
    select s1.t+s2.t+s3.t+s4.t
    from src s1, src s2, src s3, src s4
    where s1.p+s2.p+s3.p+s4.p=power(2,4)-1
    

    My original post:

    declare @t varchar(4) = 'ABCD'
    
    ;with src(t,n,p) as (
    select substring(@t,1,1),1,power(2,0)
    union all
    select substring(@t,n+1,1),n+1,power(2,n)
    from src
    where n < len(@t)
    )
    select s1.t+s2.t+s3.t+s4.t
    from src s1, src s2, src s3, src s4
    where s1.p+s2.p+s3.p+s4.p=power(2,len(@t))-1
    

    This is one of those problems that haunts you. I liked the simplicity of my original answer but there was this issue where I was still building all the possible solutions and then selecting the correct ones. One more try to make this process more efficient by only building the solutions that were correct yielded this answer. Add a character to the string only if that character didn't exist in the string. Patindex seemed like the perfect companion for a CTE solution. Here it is.

    declare @t varchar(10) = 'ABCDEFGHIJ'
    
    ;with s(t,n) as (
    select substring(@t,1,1),1
    union all
    select substring(@t,n+1,1),n+1
    from s where n<len(@t)
    )
    ,j(t) as (
    select cast(t as varchar(10)) from s
    union all
    select cast(j.t+s.t as varchar(10))
    from j,s where patindex('%'+s.t+'%',j.t)=0
    )
    select t from j where len(t)=len(@t)
    

    I was able to build all 3.6 million solutions in 3 minutes and 2 seconds. Hopefully this solution will not get missed just because it's not the first.

    0 讨论(0)
  • 2020-11-29 07:26

    --Hopefully this is a quick solution, just change the values going into #X

    IF OBJECT_ID('tempdb.dbo.#X', 'U') IS NOT NULL  DROP TABLE #X; CREATE table #X([Opt] [nvarchar](10) NOT NULL)
    Insert into #X values('a'),('b'),('c'),('d')
    declare @pSQL NVarChar(max)='select * from #X X1 ', @pN int =(select count(*) from #X), @pC int = 0;
    while @pC<@pN begin
    if @pC>0 set  @pSQL = concat(@pSQL,' cross join #X X', @pC+1);
    set @pC = @pC +1;
    end
    execute(@pSQL)
    

    --or as single column result

    IF OBJECT_ID('tempdb.dbo.#X', 'U') IS NOT NULL  DROP TABLE #X; CREATE table #X([Opt] [nvarchar](10) NOT NULL)
    Insert into #X values('a'),('b'),('c'),('d')
    declare @pSQL NVarChar(max)=' as R from #X X1 ',@pSelect NVarChar(Max)=' ',@pJoin NVarChar(Max)='', @pN int =(select count(*) from #X), @pC int = 0;
    while @pC<@pN begin
    if @pC>0 set  @pJoin = concat(@pJoin ,' cross join #X X', @pC+1) set @pSelect =  concat(@pSelect ,'+ X', @pC+1,'.Opt ')
    set @pC = @pC +1;
    end
    set @pSQL = concat ('select X1.Opt', @pSelect,@pSQL ,@pJoin)
    exec(@pSQL)
    
    0 讨论(0)
  • 2020-11-29 07:27
    DECLARE @s VARCHAR(5);
    SET @s = 'ABCDE';
    
    WITH Subsets AS (
    SELECT CAST(SUBSTRING(@s, Number, 1) AS VARCHAR(5)) AS Token,
    CAST('.'+CAST(Number AS CHAR(1))+'.' AS VARCHAR(11)) AS Permutation,
    CAST(1 AS INT) AS Iteration
    FROM dbo.Numbers WHERE Number BETWEEN 1 AND 5
    UNION ALL
    SELECT CAST(Token+SUBSTRING(@s, Number, 1) AS VARCHAR(5)) AS Token,
    CAST(Permutation+CAST(Number AS CHAR(1))+'.' AS VARCHAR(11)) AS
    Permutation,
    s.Iteration + 1 AS Iteration
    FROM Subsets s JOIN dbo.Numbers n ON s.Permutation NOT LIKE
    '%.'+CAST(Number AS CHAR(1))+'.%' AND s.Iteration < 5 AND Number
    BETWEEN 1 AND 5
    --AND s.Iteration = (SELECT MAX(Iteration) FROM Subsets)
    )
    SELECT * FROM Subsets
    WHERE Iteration = 5
    ORDER BY Permutation
    
    Token Permutation Iteration
    ----- ----------- -----------
    ABCDE .1.2.3.4.5. 5
    ABCED .1.2.3.5.4. 5
    ABDCE .1.2.4.3.5. 5
    (snip)
    EDBCA .5.4.2.3.1. 5
    EDCAB .5.4.3.1.2. 5
    EDCBA .5.4.3.2.1. 5
    

    first posted a while ago here

    However, it would be better to do it in a better language such as C# or C++.

    0 讨论(0)
提交回复
热议问题