How to strip all non-alphabetic characters from string in SQL Server?

后端 未结 18 1308
情深已故
情深已故 2020-11-21 23:49

How could you remove all characters that are not alphabetic from a string?

What about non-alphanumeric?

Does this have to be a custom function or are there

18条回答
  •  终归单人心
    2020-11-22 00:30

    Here's another recursive CTE solution, based on @Gerhard Weiss's answer here. You should be able to copy and paste the whole code block into SSMS and play with it there. The results include a few extra columns to help us understand what's going on. It took me a while until I understood all that's going on with both PATINDEX (RegEx) and the recursive CTE.

    DECLARE @DefineBadCharPattern varchar(30)
    SET @DefineBadCharPattern = '%[^A-z]%'  --Means anything NOT between A and z characters (according to ascii char value) is "bad"
    SET @DefineBadCharPattern = '%[^a-z0-9]%'  --Means anything NOT between a and z characters or numbers 0 through 9 (according to ascii char value) are "bad"
    SET @DefineBadCharPattern = '%[^ -~]%'  --Means anything NOT between space and ~ characters (all non-printable characters) is "bad"
    --Change @ReplaceBadCharWith to '' to strip "bad" characters from string
    --Change to some character if you want to 'see' what's being replaced. NOTE: It must be allowed accoring to @DefineBadCharPattern above
    DECLARE @ReplaceBadCharWith varchar(1) = '#'  --Change this to whatever you want to replace non-printable chars with 
    IF patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, @ReplaceBadCharWith) > 0
        BEGIN
            RAISERROR('@ReplaceBadCharWith value (%s) must be a character allowed by PATINDEX pattern of %s',16,1,@ReplaceBadCharWith, @DefineBadCharPattern)
            RETURN
        END
    --A table of values to play with:
    DECLARE @temp TABLE (OriginalString varchar(100))
    INSERT @temp SELECT ' 1hello' + char(13) + char(10) + 'there' + char(30) + char(9) + char(13) + char(10)
    INSERT @temp SELECT '2hello' + char(30) + 'there' + char(30)
    INSERT @temp SELECT ' 3hello there'
    INSERT @temp SELECT ' tab' + char(9) + ' character'
    INSERT @temp SELECT 'good bye'
    
    --Let the magic begin:
    ;WITH recurse AS (
        select
        OriginalString,
        OriginalString as CleanString,
        patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString) as [Position],
        substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1) as [InvalidCharacter],
        ascii(substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1)) as [ASCIICode]
        from @temp
       UNION ALL
        select
        OriginalString,
        CONVERT(varchar(100),REPLACE(CleanString,InvalidCharacter,@ReplaceBadCharWith)),
        patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) as [Position],
        substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1),
        ascii(substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1))
        from recurse
        where patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) > 0
    )
    SELECT * FROM recurse
    --optionally comment out this last WHERE clause to see more of what the recursion is doing:
    WHERE patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) = 0
    

提交回复
热议问题