Regex pattern inside SQL Replace function?

前端 未结 10 1300
梦谈多话
梦谈多话 2020-11-22 12:42
SELECT REPLACE(\'100.00 GB\', \'%^(^-?\\d*\\.{0,1}\\d+$)%\', \'\');

I want to replace any markup between two

相关标签:
10条回答
  • 2020-11-22 12:47

    Here is a function I wrote to accomplish this based off of the previous answers.

    CREATE FUNCTION dbo.RepetitiveReplace
    (
        @P_String VARCHAR(MAX),
        @P_Pattern VARCHAR(MAX),
        @P_ReplaceString VARCHAR(MAX),
        @P_ReplaceLength INT = 1
    )
    RETURNS VARCHAR(MAX)
    BEGIN
        DECLARE @Index INT;
    
        -- Get starting point of pattern
        SET @Index = PATINDEX(@P_Pattern, @P_String);
    
        while @Index > 0
        begin
            --replace matching charactger at index
            SET @P_String = STUFF(@P_String, PATINDEX(@P_Pattern, @P_String), @P_ReplaceLength, @P_ReplaceString);
            SET @Index = PATINDEX(@P_Pattern, @P_String);
        end
    
        RETURN @P_String;
    END;
    

    Gist

    Edit:

    Originally I had a recursive function here which does not play well with sql server as it has a 32 nesting level limit which would result in an error like the below any time you attempt to make 32+ replacements with the function. Instead of trying to make a server level change to allow more nesting (which could be dangerous like allow never ending loops) switching to a while loop makes a lot more sense.

    Maximum stored procedure, function, trigger, or view nesting level exceeded (limit 32).

    0 讨论(0)
  • 2020-11-22 12:52

    I think a simpler and faster approach is iterate by each character of the alphabet:

    DECLARE @i int
    SET @i = 0
    
    WHILE(@i < 256)
    BEGIN  
    
        IF char(@i) NOT IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '.')      
    
          UPDATE Table SET Column = replace(Column, char(@i), '')
    
        SET @i = @i + 1
    
    END
    
    0 讨论(0)
  • 2020-11-22 12:55

    Instead of stripping out the found character by its sole position, using Replace(Column, BadFoundCharacter, '') could be substantially faster. Additionally, instead of just replacing the one bad character found next in each column, this replaces all those found.

    WHILE 1 = 1 BEGIN
        UPDATE dbo.YourTable
        SET Column = Replace(Column, Substring(Column, PatIndex('%[^0-9.-]%', Column), 1), '')
        WHERE Column LIKE '%[^0-9.-]%'
        If @@RowCount = 0 BREAK;
    END;
    

    I am convinced this will work better than the accepted answer, if only because it does fewer operations. There are other ways that might also be faster, but I don't have time to explore those right now.

    0 讨论(0)
  • 2020-11-22 13:03

    I stumbled across this post looking for something else but thought I'd mention a solution I use which is far more efficient - and really should be the default implementation of any function when used with a set based query - which is to use a cross applied table function. Seems the topic is still active so hopefully this is useful to someone.

    Example runtime on a few of the answers so far based on running recursive set based queries or scalar function, based on 1m rows test set removing the chars from a random newid, ranges from 34s to 2m05s for the WHILE loop examples and from 1m3s to {forever} for the function examples.

    Using a table function with cross apply achieves the same goal in 10s. You may need to adjust it to suit your needs such as the max length it handles.

    Function:

    CREATE FUNCTION [dbo].[RemoveChars](@InputUnit VARCHAR(40))
    RETURNS TABLE
    AS
    RETURN
        (
            WITH Numbers_prep(Number) AS
                (
                    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                )
            ,Numbers(Number) AS
                (
                    SELECT TOP (ISNULL(LEN(@InputUnit),0))
                        row_number() OVER (ORDER BY (SELECT NULL))
                    FROM Numbers_prep a
                        CROSS JOIN Numbers_prep b
                )
            SELECT
                OutputUnit
            FROM
                (
                    SELECT
                        substring(@InputUnit,Number,1)
                    FROM  Numbers
                    WHERE substring(@InputUnit,Number,1) like '%[0-9]%'
                    ORDER BY Number
                    FOR XML PATH('')
                ) Sub(OutputUnit)
        )
    

    Usage:

    UPDATE t
    SET column = o.OutputUnit
    FROM ##t t
    CROSS APPLY [dbo].[RemoveChars](t.column) o
    
    0 讨论(0)
  • 2020-11-22 13:04

    Wrapping the solution inside a SQL function could be useful if you want to reuse it. I'm even doing it at the cell level, that's why I'm putting this as a different answer:

    CREATE FUNCTION [dbo].[fnReplaceInvalidChars] (@string VARCHAR(300))
    RETURNS VARCHAR(300)
    BEGIN
        DECLARE @str VARCHAR(300) = @string;
        DECLARE @Pattern VARCHAR (20) = '%[^a-zA-Z0-9]%';
        DECLARE @Len INT;
        SELECT @Len = LEN(@String); 
        WHILE @Len > 0 
        BEGIN
            SET @Len = @Len - 1;
            IF (PATINDEX(@Pattern,@str) > 0)
                BEGIN
                    SELECT @str = STUFF(@str, PATINDEX(@Pattern,@str),1,'');    
                END
            ELSE
            BEGIN
                BREAK;
            END
        END     
        RETURN @str
    END
    
    0 讨论(0)
  • 2020-11-22 13:04

    I've created this function to clean up a string that contained non numeric characters in a time field. The time contained question marks when they did not added the minutes, something like this 20:??. Function loops through each character and replaces the ? with a 0 :

     CREATE FUNCTION [dbo].[CleanTime]
    (
        -- Add the parameters for the function here
        @intime nvarchar(10) 
    )
    RETURNS nvarchar(5)
    AS
    BEGIN
        -- Declare the return variable here
        DECLARE @ResultVar nvarchar(5)
        DECLARE @char char(1)
        -- Add the T-SQL statements to compute the return value here
        DECLARE @i int = 1
        WHILE @i <= LEN(@intime)
        BEGIN
        SELECT @char =  CASE WHEN substring(@intime,@i,1) like '%[0-9:]%' THEN substring(@intime,@i,1) ELSE '0' END
        SELECT @ResultVar = concat(@ResultVar,@char)   
        set @i  = @i + 1       
        END;
        -- Return the result of the function
        RETURN @ResultVar
    
    END
    
    0 讨论(0)
提交回复
热议问题