How to strip all non-alphabetic characters from string in SQL Server?

后端 未结 18 1327
情深已故
情深已故 2020-11-21 23:49

How could you remove all characters that are not alphabetic from a string?

What about non-alphanumeric?

Does this have to be a custom function or are there

相关标签:
18条回答
  • 2020-11-22 00:26

    Here's a solution that doesn't require creating a function or listing all instances of characters to replace. It uses a recursive WITH statement in combination with a PATINDEX to find unwanted chars. It will replace all unwanted chars in a column - up to 100 unique bad characters contained in any given string. (E.G. "ABC123DEF234" would contain 4 bad characters 1, 2, 3 and 4) The 100 limit is the maximum number of recursions allowed in a WITH statement, but this doesn't impose a limit on the number of rows to process, which is only limited by the memory available.
    If you don't want DISTINCT results, you can remove the two options from the code.

    -- Create some test data:
    SELECT * INTO #testData 
    FROM (VALUES ('ABC DEF,K.l(p)'),('123H,J,234'),('ABCD EFG')) as t(TXT)
    
    -- Actual query:
    -- Remove non-alpha chars: '%[^A-Z]%'
    -- Remove non-alphanumeric chars: '%[^A-Z0-9]%'
    DECLARE @BadCharacterPattern VARCHAR(250) = '%[^A-Z]%';
    
    WITH recurMain as (
        SELECT DISTINCT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
        FROM #testData
        UNION ALL
        SELECT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
        FROM (
            SELECT 
                CASE WHEN BadCharIndex > 0 
                    THEN REPLACE(TXT, SUBSTRING(TXT, BadCharIndex, 1), '')
                    ELSE TXT 
                END AS TXT
            FROM recurMain
            WHERE BadCharIndex > 0
        ) badCharFinder
    )
    SELECT DISTINCT TXT
    FROM recurMain
    WHERE BadCharIndex = 0;
    
    0 讨论(0)
  • 2020-11-22 00:29

    Using a CTE generated numbers table to examine each character, then FOR XML to concat to a string of kept values you can...

    CREATE FUNCTION [dbo].[PatRemove](
        @pattern varchar(50),
        @expression varchar(8000) 
        )
    RETURNS varchar(8000)
    AS
    BEGIN
        WITH 
            d(d) AS (SELECT d FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) digits(d)),
            nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM d d1, d d2, d d3, d d4),
            chars(c) AS (SELECT SUBSTRING(@expression, n, 1) FROM nums WHERE n <= LEN(@expression))
        SELECT 
            @expression = (SELECT c AS [text()] FROM chars WHERE c NOT LIKE @pattern FOR XML PATH(''));
    
        RETURN @expression;
    END
    
    0 讨论(0)
  • 2020-11-22 00:30

    Believe it or not, in my system this ugly function performs better than G Mastros elegant one.

    CREATE FUNCTION dbo.RemoveSpecialChar (@s VARCHAR(256)) 
    RETURNS VARCHAR(256) 
    WITH SCHEMABINDING
        BEGIN
            IF @s IS NULL
                RETURN NULL
            DECLARE @s2 VARCHAR(256) = '',
                    @l INT = LEN(@s),
                    @p INT = 1
    
            WHILE @p <= @l
                BEGIN
                    DECLARE @c INT
                    SET @c = ASCII(SUBSTRING(@s, @p, 1))
                    IF @c BETWEEN 48 AND 57
                       OR  @c BETWEEN 65 AND 90
                       OR  @c BETWEEN 97 AND 122
                        SET @s2 = @s2 + CHAR(@c)
                    SET @p = @p + 1
                END
    
            IF LEN(@s2) = 0
                RETURN NULL
    
            RETURN @s2
    
    0 讨论(0)
  • 2020-11-22 00:30

    Here's another recursive CTE solution, based on @Gerhard Weiss's answer here. You should be able to copy and paste the whole code block into SSMS and play with it there. The results include a few extra columns to help us understand what's going on. It took me a while until I understood all that's going on with both PATINDEX (RegEx) and the recursive CTE.

    DECLARE @DefineBadCharPattern varchar(30)
    SET @DefineBadCharPattern = '%[^A-z]%'  --Means anything NOT between A and z characters (according to ascii char value) is "bad"
    SET @DefineBadCharPattern = '%[^a-z0-9]%'  --Means anything NOT between a and z characters or numbers 0 through 9 (according to ascii char value) are "bad"
    SET @DefineBadCharPattern = '%[^ -~]%'  --Means anything NOT between space and ~ characters (all non-printable characters) is "bad"
    --Change @ReplaceBadCharWith to '' to strip "bad" characters from string
    --Change to some character if you want to 'see' what's being replaced. NOTE: It must be allowed accoring to @DefineBadCharPattern above
    DECLARE @ReplaceBadCharWith varchar(1) = '#'  --Change this to whatever you want to replace non-printable chars with 
    IF patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, @ReplaceBadCharWith) > 0
        BEGIN
            RAISERROR('@ReplaceBadCharWith value (%s) must be a character allowed by PATINDEX pattern of %s',16,1,@ReplaceBadCharWith, @DefineBadCharPattern)
            RETURN
        END
    --A table of values to play with:
    DECLARE @temp TABLE (OriginalString varchar(100))
    INSERT @temp SELECT ' 1hello' + char(13) + char(10) + 'there' + char(30) + char(9) + char(13) + char(10)
    INSERT @temp SELECT '2hello' + char(30) + 'there' + char(30)
    INSERT @temp SELECT ' 3hello there'
    INSERT @temp SELECT ' tab' + char(9) + ' character'
    INSERT @temp SELECT 'good bye'
    
    --Let the magic begin:
    ;WITH recurse AS (
        select
        OriginalString,
        OriginalString as CleanString,
        patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString) as [Position],
        substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1) as [InvalidCharacter],
        ascii(substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1)) as [ASCIICode]
        from @temp
       UNION ALL
        select
        OriginalString,
        CONVERT(varchar(100),REPLACE(CleanString,InvalidCharacter,@ReplaceBadCharWith)),
        patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) as [Position],
        substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1),
        ascii(substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1))
        from recurse
        where patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) > 0
    )
    SELECT * FROM recurse
    --optionally comment out this last WHERE clause to see more of what the recursion is doing:
    WHERE patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) = 0
    
    0 讨论(0)
  • 2020-11-22 00:31

    Though post is a bit old, I would like to say the following. Issue I had with above solution is that it does not filter out characters like ç, ë, ï, etc. I adapted a function as follows (I only used an 80 varchar string to save memory):

    create FUNCTION dbo.udf_Cleanchars (@InputString varchar(80)) 
    RETURNS varchar(80) 
    AS 
    
    BEGIN 
    declare @return varchar(80) , @length int , @counter int , @cur_char char(1) 
    SET @return = '' 
    SET @length = 0 
    SET @counter = 1 
    SET @length = LEN(@InputString) 
    IF @length > 0 
    BEGIN WHILE @counter <= @length 
    
    BEGIN SET @cur_char = SUBSTRING(@InputString, @counter, 1) IF ((ascii(@cur_char) in (32,44,46)) or (ascii(@cur_char) between 48 and 57) or (ascii(@cur_char) between 65 and 90) or (ascii(@cur_char) between 97 and 122))
    BEGIN SET @return = @return + @cur_char END 
    SET @counter = @counter + 1 
    END END 
    
    RETURN @return END
    
    0 讨论(0)
  • 2020-11-22 00:33

    this way didn't work for me as i was trying to keep the Arabic letters i tried to replace the regular expression but also it didn't work. i wrote another method to work on ASCII level as it was my only choice and it worked.

     Create function [dbo].[RemoveNonAlphaCharacters] (@s varchar(4000)) returns varchar(4000)
       with schemabinding
    begin
       if @s is null
          return null
       declare @s2 varchar(4000)
       set @s2 = ''
       declare @l int
       set @l = len(@s)
       declare @p int
       set @p = 1
       while @p <= @l begin
          declare @c int
          set @c = ascii(substring(@s, @p, 1))
          if @c between 48 and 57 or @c between 65 and 90 or @c between 97 and 122 or @c between 165 and 253 or @c between 32 and 33
             set @s2 = @s2 + char(@c)
          set @p = @p + 1
          end
       if len(@s2) = 0
          return null
       return @s2
       end
    

    GO

    0 讨论(0)
提交回复
热议问题