Optimal way to concatenate/aggregate strings

前端 未结 7 1235
[愿得一人]
[愿得一人] 2020-11-22 11:17

I\'m finding a way to aggregate strings from different rows into a single row. I\'m looking to do this in many different places, so having a function to facilitate this woul

相关标签:
7条回答
  • 2020-11-22 11:48

    You can use += to concatenate strings, for example:

    declare @test nvarchar(max)
    set @test = ''
    select @test += name from names
    

    if you select @test, it will give you all names concatenated

    0 讨论(0)
  • 2020-11-22 11:52

    For those of us who found this and are not using Azure SQL Database:

    STRING_AGG() in PostgreSQL, SQL Server 2017 and Azure SQL
    https://www.postgresql.org/docs/current/static/functions-aggregate.html
    https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql

    GROUP_CONCAT() in MySQL
    http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat

    (Thanks to @Brianjorden and @milanio for Azure update)

    Example Code:

    select Id
    , STRING_AGG(Name, ', ') Names 
    from Demo
    group by Id
    

    SQL Fiddle: http://sqlfiddle.com/#!18/89251/1

    0 讨论(0)
  • 2020-11-22 11:54

    Although @serge answer is correct but i compared time consumption of his way against xmlpath and i found the xmlpath is so faster. I'll write the compare code and you can check it by yourself. This is @serge way:

    DECLARE @startTime datetime2;
    DECLARE @endTime datetime2;
    DECLARE @counter INT;
    SET @counter = 1;
    
    set nocount on;
    
    declare @YourTable table (ID int, Name nvarchar(50))
    
    WHILE @counter < 1000
    BEGIN
        insert into @YourTable VALUES (ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
        SET @counter = @counter + 1;
    END
    
    SET @startTime = GETDATE()
    
    ;WITH Partitioned AS
    (
        SELECT 
            ID,
            Name,
            ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
            COUNT(*) OVER (PARTITION BY ID) AS NameCount
        FROM @YourTable
    ),
    Concatenated AS
    (
        SELECT ID, CAST(Name AS nvarchar) AS FullName, Name, NameNumber, NameCount FROM Partitioned WHERE NameNumber = 1
    
        UNION ALL
    
        SELECT 
            P.ID, CAST(C.FullName + ', ' + P.Name AS nvarchar), P.Name, P.NameNumber, P.NameCount
        FROM Partitioned AS P
            INNER JOIN Concatenated AS C ON P.ID = C.ID AND P.NameNumber = C.NameNumber + 1
    )
    SELECT 
        ID,
        FullName
    FROM Concatenated
    WHERE NameNumber = NameCount
    
    SET @endTime = GETDATE();
    
    SELECT DATEDIFF(millisecond,@startTime, @endTime)
    --Take about 54 milliseconds
    

    And this is xmlpath way:

    DECLARE @startTime datetime2;
    DECLARE @endTime datetime2;
    DECLARE @counter INT;
    SET @counter = 1;
    
    set nocount on;
    
    declare @YourTable table (RowID int, HeaderValue int, ChildValue varchar(5))
    
    WHILE @counter < 1000
    BEGIN
        insert into @YourTable VALUES (@counter, ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
        SET @counter = @counter + 1;
    END
    
    SET @startTime = GETDATE();
    
    set nocount off
    SELECT
        t1.HeaderValue
            ,STUFF(
                       (SELECT
                            ', ' + t2.ChildValue
                            FROM @YourTable t2
                            WHERE t1.HeaderValue=t2.HeaderValue
                            ORDER BY t2.ChildValue
                            FOR XML PATH(''), TYPE
                       ).value('.','varchar(max)')
                       ,1,2, ''
                  ) AS ChildValues
        FROM @YourTable t1
        GROUP BY t1.HeaderValue
    
    SET @endTime = GETDATE();
    
    SELECT DATEDIFF(millisecond,@startTime, @endTime)
    --Take about 4 milliseconds
    
    0 讨论(0)
  • 2020-11-22 11:57

    Update: Ms SQL Server 2017+, Azure SQL Database

    You can use: STRING_AGG.

    Usage is pretty simple for OP's request:

    SELECT id, STRING_AGG(name, ', ') AS names
    FROM some_table
    GROUP BY id
    

    Read More

    Well my old non-answer got rightfully deleted (left in-tact below), but if anyone happens to land here in the future, there is good news. They have implimented STRING_AGG() in Azure SQL Database as well. That should provide the exact functionality originally requested in this post with native and built in support. @hrobky mentioned this previously as a SQL Server 2016 feature at the time.

    --- Old Post: Not enough reputation here to reply to @hrobky directly, but STRING_AGG looks great, however it is only available in SQL Server 2016 vNext currently. Hopefully it will follow to Azure SQL Datababse soon as well..

    0 讨论(0)
  • 2020-11-22 12:00

    SOLUTION

    The definition of optimal can vary, but here's how to concatenate strings from different rows using regular Transact SQL, which should work fine in Azure.

    ;WITH Partitioned AS
    (
        SELECT 
            ID,
            Name,
            ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
            COUNT(*) OVER (PARTITION BY ID) AS NameCount
        FROM dbo.SourceTable
    ),
    Concatenated AS
    (
        SELECT 
            ID, 
            CAST(Name AS nvarchar) AS FullName, 
            Name, 
            NameNumber, 
            NameCount 
        FROM Partitioned 
        WHERE NameNumber = 1
    
        UNION ALL
    
        SELECT 
            P.ID, 
            CAST(C.FullName + ', ' + P.Name AS nvarchar), 
            P.Name, 
            P.NameNumber, 
            P.NameCount
        FROM Partitioned AS P
            INNER JOIN Concatenated AS C 
                    ON P.ID = C.ID 
                    AND P.NameNumber = C.NameNumber + 1
    )
    SELECT 
        ID,
        FullName
    FROM Concatenated
    WHERE NameNumber = NameCount
    

    EXPLANATION

    The approach boils down to three steps:

    1. Number the rows using OVER and PARTITION grouping and ordering them as needed for the concatenation. The result is Partitioned CTE. We keep counts of rows in each partition to filter the results later.

    2. Using recursive CTE (Concatenated) iterate through the row numbers (NameNumber column) adding Name values to FullName column.

    3. Filter out all results but the ones with the highest NameNumber.

    Please keep in mind that in order to make this query predictable one has to define both grouping (for example, in your scenario rows with the same ID are concatenated) and sorting (I assumed that you simply sort the string alphabetically before concatenation).

    I've quickly tested the solution on SQL Server 2012 with the following data:

    INSERT dbo.SourceTable (ID, Name)
    VALUES 
    (1, 'Matt'),
    (1, 'Rocks'),
    (2, 'Stylus'),
    (3, 'Foo'),
    (3, 'Bar'),
    (3, 'Baz')
    

    The query result:

    ID          FullName
    ----------- ------------------------------
    2           Stylus
    3           Bar, Baz, Foo
    1           Matt, Rocks
    
    0 讨论(0)
  • 2020-11-22 12:13

    I found Serge's answer to be very promising, but I also encountered performance issues with it as-written. However, when I restructured it to use temporary tables and not include double CTE tables, the performance went from 1 minute 40 seconds to sub-second for 1000 combined records. Here it is for anyone who needs to do this without FOR XML on older versions of SQL Server:

    DECLARE @STRUCTURED_VALUES TABLE (
         ID                 INT
        ,VALUE              VARCHAR(MAX) NULL
        ,VALUENUMBER        BIGINT
        ,VALUECOUNT         INT
    );
    
    INSERT INTO @STRUCTURED_VALUES
    SELECT   ID
            ,VALUE
            ,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VALUE) AS VALUENUMBER
            ,COUNT(*) OVER (PARTITION BY ID)    AS VALUECOUNT
    FROM    RAW_VALUES_TABLE;
    
    WITH CTE AS (
        SELECT   SV.ID
                ,SV.VALUE
                ,SV.VALUENUMBER
                ,SV.VALUECOUNT
        FROM    @STRUCTURED_VALUES SV
        WHERE   VALUENUMBER = 1
    
        UNION ALL
    
        SELECT   SV.ID
                ,CTE.VALUE + ' ' + SV.VALUE AS VALUE
                ,SV.VALUENUMBER
                ,SV.VALUECOUNT
        FROM    @STRUCTURED_VALUES SV
        JOIN    CTE 
            ON  SV.ID = CTE.ID
            AND SV.VALUENUMBER = CTE.VALUENUMBER + 1
    
    )
    SELECT   ID
            ,VALUE
    FROM    CTE
    WHERE   VALUENUMBER = VALUECOUNT
    ORDER BY ID
    ;
    
    0 讨论(0)
提交回复
热议问题