How to strip all non-alphabetic characters from string in SQL Server?

后端 未结 18 1301
情深已故
情深已故 2020-11-21 23:49

How could you remove all characters that are not alphabetic from a string?

What about non-alphanumeric?

Does this have to be a custom function or are there

相关标签:
18条回答
  • 2020-11-22 00:35

    This solution, inspired by Mr. Allen's solution, requires a Numbers table of integers (which you should have on hand if you want to do serious query operations with good performance). It does not require a CTE. You can change the NOT IN (...) expression to exclude specific characters, or change it to an IN (...) OR LIKE expression to retain only certain characters.

    SELECT (
        SELECT  SUBSTRING([YourString], N, 1)
        FROM    dbo.Numbers
        WHERE   N > 0 AND N <= CONVERT(INT, LEN([YourString]))
            AND SUBSTRING([YourString], N, 1) NOT IN ('(',')',',','.')
        FOR XML PATH('')
    ) AS [YourStringTransformed]
    FROM ...
    
    0 讨论(0)
  • 2020-11-22 00:37

    I knew that SQL was bad at string manipulation, but I didn't think it would be this difficult. Here's a simple function to strip out all the numbers from a string. There would be better ways to do this, but this is a start.

    CREATE FUNCTION dbo.AlphaOnly (
        @String varchar(100)
    )
    RETURNS varchar(100)
    AS BEGIN
      RETURN (
        REPLACE(
          REPLACE(
            REPLACE(
              REPLACE(
                REPLACE(
                  REPLACE(
                    REPLACE(
                      REPLACE(
                        REPLACE(
                          REPLACE(
                            @String,
                          '9', ''),
                        '8', ''),
                      '7', ''),
                    '6', ''),
                  '5', ''),
                '4', ''),
              '3', ''),
            '2', ''),
          '1', ''),
        '0', '')
      )
    END
    GO
    
    -- ==================
    DECLARE @t TABLE (
        ColID       int,
        ColString   varchar(50)
    )
    
    INSERT INTO @t VALUES (1, 'abc1234567890')
    
    SELECT ColID, ColString, dbo.AlphaOnly(ColString)
    FROM @t
    

    Output

    ColID ColString
    ----- ------------- ---
        1 abc1234567890 abc
    

    Round 2 - Data-Driven Blacklist

    -- ============================================
    -- Create a table of blacklist characters
    -- ============================================
    IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.CharacterBlacklist'))
      DROP TABLE dbo.CharacterBlacklist
    GO
    CREATE TABLE dbo.CharacterBlacklist (
        CharID              int         IDENTITY,
        DisallowedCharacter nchar(1)    NOT NULL
    )
    GO
    INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'0')
    INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'1')
    INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'2')
    INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'3')
    INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'4')
    INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'5')
    INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'6')
    INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'7')
    INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'8')
    INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'9')
    GO
    
    -- ====================================
    IF EXISTS (SELECT * FROM sys.objects WHERE [object_id] = OBJECT_ID('dbo.StripBlacklistCharacters'))
      DROP FUNCTION dbo.StripBlacklistCharacters
    GO
    CREATE FUNCTION dbo.StripBlacklistCharacters (
        @String nvarchar(100)
    )
    RETURNS varchar(100)
    AS BEGIN
      DECLARE @blacklistCt  int
      DECLARE @ct           int
      DECLARE @c            nchar(1)
    
      SELECT @blacklistCt = COUNT(*) FROM dbo.CharacterBlacklist
    
      SET @ct = 0
      WHILE @ct < @blacklistCt BEGIN
        SET @ct = @ct + 1
    
        SELECT @String = REPLACE(@String, DisallowedCharacter, N'')
        FROM dbo.CharacterBlacklist
        WHERE CharID = @ct
      END
    
      RETURN (@String)
    END
    GO
    
    -- ====================================
    DECLARE @s  nvarchar(24)
    SET @s = N'abc1234def5678ghi90jkl'
    
    SELECT
        @s                  AS OriginalString,
        dbo.StripBlacklistCharacters(@s)   AS ResultString
    

    Output

    OriginalString           ResultString
    ------------------------ ------------
    abc1234def5678ghi90jkl   abcdefghijkl
    

    My challenge to readers: Can you make this more efficient? What about using recursion?

    0 讨论(0)
  • 2020-11-22 00:37

    Having looked at all the given solutions I thought that there has to be a pure SQL method that does not require a function or a CTE / XML query, and doesn't involve difficult to maintain nested REPLACE statements. Here is my solution:

    SELECT 
      x
      ,CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 1, 1) + '%' THEN '' ELSE SUBSTRING(x, 1, 1) END
        + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 2, 1) + '%' THEN '' ELSE SUBSTRING(x, 2, 1) END
        + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 3, 1) + '%' THEN '' ELSE SUBSTRING(x, 3, 1) END
        + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 4, 1) + '%' THEN '' ELSE SUBSTRING(x, 4, 1) END
        + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 5, 1) + '%' THEN '' ELSE SUBSTRING(x, 5, 1) END
        + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 6, 1) + '%' THEN '' ELSE SUBSTRING(x, 6, 1) END
    -- Keep adding rows until you reach the column size 
        AS stripped_column
    FROM (SELECT 
            column_to_strip AS x
            ,'ABCDEFGHIJKLMNOPQRSTUVWXYZ' AS a 
          FROM my_table) a
    

    The advantage of doing it this way is that the valid characters are contained in the one string in the sub query making easy to reconfigure for a different set of characters.

    The downside is that you have to add a row of SQL for each character up to the size of your column. To make that task easier I just used the Powershell script below, this example if for a VARCHAR(64):

    1..64 | % {
      "    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, {0}, 1) + '%' THEN '' ELSE SUBSTRING(x, {0}, 1) END" -f $_
    } | clip.exe
    
    0 讨论(0)
  • 2020-11-22 00:38

    If you are like me and don't have access to just add functions to your production data but still want to perform this kind of filtering, here's a pure SQL solution using a PIVOT table to put the filtered pieces back together again.

    N.B. I hardcoded the table up to 40 characters, you'll have to add more if you have longer strings to filter.

    SET CONCAT_NULL_YIELDS_NULL OFF;
    
    with 
        ToBeScrubbed
    as (
        select 1 as id, '*SOME 222@ !@* #* BOGUS !@*&! DATA' as ColumnToScrub
    ),
    
    Scrubbed as (
        select 
            P.Number as ValueOrder,
            isnull ( substring ( t.ColumnToScrub , number , 1 ) , '' ) as ScrubbedValue,
            t.id
        from
            ToBeScrubbed t
            left join master..spt_values P
                on P.number between 1 and len(t.ColumnToScrub)
                and type ='P'
        where
            PatIndex('%[^a-z]%', substring(t.ColumnToScrub,P.number,1) ) = 0
    )
    
    SELECT
        id, 
        [1]+ [2]+ [3]+ [4]+ [5]+ [6]+ [7]+ [8] +[9] +[10]
        +  [11]+ [12]+ [13]+ [14]+ [15]+ [16]+ [17]+ [18] +[19] +[20]
        +  [21]+ [22]+ [23]+ [24]+ [25]+ [26]+ [27]+ [28] +[29] +[30]
        +  [31]+ [32]+ [33]+ [34]+ [35]+ [36]+ [37]+ [38] +[39] +[40] as ScrubbedData
    FROM (
        select 
            *
        from 
            Scrubbed
        ) 
        src
        PIVOT (
            MAX(ScrubbedValue) FOR ValueOrder IN (
            [1], [2], [3], [4], [5], [6], [7], [8], [9], [10],
            [11], [12], [13], [14], [15], [16], [17], [18], [19], [20],
            [21], [22], [23], [24], [25], [26], [27], [28], [29], [30],
            [31], [32], [33], [34], [35], [36], [37], [38], [39], [40]
            )
        ) pvt
    
    0 讨论(0)
  • 2020-11-22 00:40

    I just found this built into Oracle 10g if that is what you're using. I had to strip all the special characters out for a phone number compare.

    regexp_replace(c.phone, '[^0-9]', '')
    
    0 讨论(0)
  • 2020-11-22 00:41

    I put this in both places where PatIndex is called.

    PatIndex('%[^A-Za-z0-9]%', @Temp)
    

    for the custom function above RemoveNonAlphaCharacters and renamed it RemoveNonAlphaNumericCharacters

    0 讨论(0)
提交回复
热议问题