Using T-SQL, return nth delimited element from a string

后端 未结 11 931
半阙折子戏
半阙折子戏 2020-11-22 14:45

I have a need to create a function the will return nth element of a delimited string.

For a data migration project, I am converting JSON audit records stored in a S

相关标签:
11条回答
  • 2020-11-22 14:55

    You can use STRING_SPLIT with ROW_NUMBER:

    SELECT value, idx FROM
    (
      SELECT
        value,
        ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) idx
      FROM STRING_SPLIT('Lorem ipsum dolor sit amet.', ' ')
    ) t
    WHERE idx=2
    

    returns second element (idx=2): 'ipsum'

    0 讨论(0)
  • 2020-11-22 14:58

    This is the easiest answer to rerieve the 67 (type-safe!!):

    SELECT CAST('<x>' + REPLACE('1,222,2,67,888,1111',',','</x><x>') + '</x>' AS XML).value('/x[4]','int')
    

    In the following you will find examples how to use this with variables for the string, the delimiter and the position (even for edge-cases with XML-forbidden characters)

    The easy one

    This question is not about a string split approach, but about how to get the nth element. The easiest, fully inlineable way would be this IMO:

    This is a real one-liner to get part 2 delimited by a space:

    DECLARE @input NVARCHAR(100)=N'part1 part2 part3';
    SELECT CAST(N'<x>' + REPLACE(@input,N' ',N'</x><x>') + N'</x>' AS XML).value('/x[2]','nvarchar(max)')
    

    Variables can be used with sql:variable() or sql:column()

    Of course you can use variables for delimiter and position (use sql:column to retrieve the position directly from a query's value):

    DECLARE @dlmt NVARCHAR(10)=N' ';
    DECLARE @pos INT = 2;
    SELECT CAST(N'<x>' + REPLACE(@input,@dlmt,N'</x><x>') + N'</x>' AS XML).value('/x[sql:variable("@pos")][1]','nvarchar(max)')
    

    Edge-Case with XML-forbidden characters

    If your string might include forbidden characters, you still can do it this way. Just use FOR XML PATH on your string first to replace all forbidden characters with the fitting escape sequence implicitly.

    It's a very special case if - additionally - your delimiter is the semicolon. In this case I replace the delimiter first to '#DLMT#', and replace this to the XML tags finally:

    SET @input=N'Some <, > and &;Other äöü@€;One more';
    SET @dlmt=N';';
    SELECT CAST(N'<x>' + REPLACE((SELECT REPLACE(@input,@dlmt,'#DLMT#') AS [*] FOR XML PATH('')),N'#DLMT#',N'</x><x>') + N'</x>' AS XML).value('/x[sql:variable("@pos")][1]','nvarchar(max)');
    

    UPDATE for SQL-Server 2016+

    Regretfully the developers forgot to return the part's index with STRING_SPLIT. But, using SQL-Server 2016+, there is JSON_VALUE and OPENJSON.

    With JSON_VALUE we can pass in the position as the index' array.

    For OPENJSON the documentation states clearly:

    When OPENJSON parses a JSON array, the function returns the indexes of the elements in the JSON text as keys.

    A string like 1,2,3 needs nothing more than brackets: [1,2,3].
    A string of words like this is an example needs to be ["this","is","an"," example"].
    These are very easy string operations. Just try it out:

    DECLARE @str VARCHAR(100)='Hello John Smith';
    DECLARE @position INT = 2;
    
    --We can build the json-path '$[1]' using CONCAT
    SELECT JSON_VALUE('["' + REPLACE(@str,' ','","') + '"]',CONCAT('$[',@position-1,']'));
    

    --See this for a position safe string-splitter (zero-based):

    SELECT  JsonArray.[key] AS [Position]
           ,JsonArray.[value] AS [Part]
    FROM OPENJSON('["' + REPLACE(@str,' ','","') + '"]') JsonArray
    

    In this post I tested various approaches and found, that OPENJSON is really fast. Even much faster than the famous "delimitedSplit8k()" method...

    UPDATE 2 - Get the values type-safe

    We can use an array within an array simply by using doubled [[]]. This allows for a typed WITH-clause:

    DECLARE  @SomeDelimitedString VARCHAR(100)='part1|1|20190920';
    
    DECLARE @JsonArray NVARCHAR(MAX)=CONCAT('[["',REPLACE(@SomeDelimitedString,'|','","'),'"]]');
    
    SELECT @SomeDelimitedString          AS TheOriginal
          ,@JsonArray                    AS TransformedToJSON
          ,ValuesFromTheArray.*
    FROM OPENJSON(@JsonArray)
    WITH(TheFirstFragment VARCHAR(100) '$[0]'
        ,TheSecondFragment INT '$[1]'
        ,TheThirdFragment DATE '$[2]') ValuesFromTheArray
    
    0 讨论(0)
  • 2020-11-22 14:59

    I cannot comment on Gary's solution because of my low reputation

    I know Gary was referencing another link.

    I have struggled to understand why we need this variable

    @ld INT = LEN(@Delimiter)
    

    I also don't understand why charindex has to start at the position of length of delimiter, @ld

    I tested with many examples with a single character delimiter, and they work. Most of the time, delimiter character is a single character. However, since the developer included the ld as length of delimiter, the code has to work for delimiters that have more than one character

    In this case, the following case will fail

    11,,,22,,,33,,,44,,,55,,,

    I cloned from the codes from this link. http://codebetter.com/raymondlewallen/2005/10/26/quick-t-sql-to-parse-a-delimited-string/

    I have tested various scenarios including the delimiters that have more than one character

    alter FUNCTION [dbo].[split1]
    (
        @string1 VARCHAR(8000) -- List of delimited items
        , @Delimiter VARCHAR(40) = ',' -- delimiter that separates items
        , @ElementNumber int
    )
    RETURNS varchar(8000)
    AS
    BEGIN
        declare @position int
        declare @piece varchar(8000)=''
        declare @returnVal varchar(8000)=''
        declare @Pattern varchar(50) = '%' + @Delimiter + '%'
        declare @counter int =0
        declare @ld int = len(@Delimiter)
        declare @ls1 int = len (@string1)
        declare @foundit int = 0
    
        if patindex(@Pattern , @string1) = 0
            return  ''
    
        if right(rtrim(@string1),1) <> @Delimiter
            set @string1 = @string1  + @Delimiter
    
        set @position =  patindex(@Pattern , @string1) + @ld  -1  
        while @position > 0
        begin
            set @counter = @counter +1 
            set @ls1  = len (@string1)
            if (@ls1 >= @ld)
                set @piece = left(@string1, @position - @ld)
            else
                break
            if (@counter = @ElementNumber)
            begin
                set @foundit = 1
                    break
            end
            if len(@string1) > 0
            begin
                set @string1 = stuff(@string1, 1, @position, '')
                set @position =  patindex(@Pattern , @string1) + @ld  -1  
            end
            else
            set @position = -1
        end 
    
    
        if @foundit =1
            set @returnVal = @piece
        else 
            set @returnVal =  ''
        return @returnVal
    
    0 讨论(0)
  • 2020-11-22 15:03

    Here is my initial solution... It is based on work by Aaron Bertrand http://www.sqlperformance.com/2012/07/t-sql-queries/split-strings

    I simply changed the return type to make it a scalar function.

    Example: SELECT dbo.GetSplitString_CTE('1,222,2,67,888,1111',',',4)

    CREATE FUNCTION dbo.GetSplitString_CTE
    (
       @List       VARCHAR(MAX),
       @Delimiter  VARCHAR(255),
       @ElementNumber int
    )
    RETURNS VARCHAR(4000)
    AS
    BEGIN
    
       DECLARE @result varchar(4000)    
       DECLARE @Items TABLE ( position int IDENTITY PRIMARY KEY,
                              Item VARCHAR(4000)
                             )  
    
       DECLARE @ll INT = LEN(@List) + 1, @ld INT = LEN(@Delimiter);  
    
       WITH a AS
       (
           SELECT
               [start] = 1,
               [end]   = COALESCE(NULLIF(CHARINDEX(@Delimiter, 
                           @List, @ld), 0), @ll),
               [value] = SUBSTRING(@List, 1, 
                         COALESCE(NULLIF(CHARINDEX(@Delimiter, 
                           @List, @ld), 0), @ll) - 1)
           UNION ALL
           SELECT
               [start] = CONVERT(INT, [end]) + @ld,
               [end]   = COALESCE(NULLIF(CHARINDEX(@Delimiter, 
                           @List, [end] + @ld), 0), @ll),
               [value] = SUBSTRING(@List, [end] + @ld, 
                         COALESCE(NULLIF(CHARINDEX(@Delimiter, 
                           @List, [end] + @ld), 0), @ll)-[end]-@ld)
           FROM a
           WHERE [end] < @ll
       )
       INSERT @Items SELECT [value]
       FROM a
       WHERE LEN([value]) > 0
       OPTION (MAXRECURSION 0);
    
       SELECT @result=Item
       FROM @Items
       WHERE position=@ElementNumber
    
       RETURN @result;
    END
    GO
    
    0 讨论(0)
  • 2020-11-22 15:06

    Alternatively, one can use xml, nodes() and ROW_NUMBER. We can order the elements based on their document order. For example:

    DECLARE @Input VARCHAR(100) = '1a,2b,3c,4d,5e,6f,7g,8h'
           ,@Number TINYINT = 3
    
    DECLARE @XML XML;
    DECLARE @value VARCHAR(100);
    
    SET @XML = CAST('<x>' + REPLACE(@Input,',','</x><x>') + '</x>' AS XML);
    
    WITH DataSource ([rowID], [rowValue]) AS
    (
        SELECT ROW_NUMBER() OVER (ORDER BY T.c ASC) 
                ,T.c.value('.', 'VARCHAR(100)')
        FROM @XML.nodes('./x') T(c)
    )
    SELECT @value = [rowValue]
    FROM DataSource
    WHERE [rowID] = @Number;
    
    SELECT @value;
    
    0 讨论(0)
  • 2020-11-22 15:10

    @a - the value (f.e. 'a/bb/ccc/ffffdd/ee/ff/....')

    @p - the desired position (1,2,3...)

    @d - the delimeter ( '/' )

    trim(substring(replace(@a,@d,replicate(' ',len(@a))),(@p-1)*len(@a)+1,len(@a)))

    only problem is - if desired part has trailing or leading blanks they get trimmed.

    Completely Based on article from https://exceljet.net/formula/split-text-with-delimiter

    0 讨论(0)
提交回复
热议问题