SQL Server: How do you remove punctuation from a field?

拈花ヽ惹草 提交于 2019-12-03 02:40:44

Ideally, you would do this in an application language such as C# + LINQ as mentioned above.

If you wanted to do it purely in T-SQL though, one way make things neater would be to firstly create a table that held all the punctuation you wanted to removed.

CREATE TABLE Punctuation 
(
    Symbol VARCHAR(1) NOT NULL
)

INSERT INTO Punctuation (Symbol) VALUES('''')
INSERT INTO Punctuation (Symbol) VALUES('-')
INSERT INTO Punctuation (Symbol) VALUES('.')

Next, you could create a function in SQL to remove all the punctuation symbols from an input string.

CREATE FUNCTION dbo.fn_RemovePunctuation
(
    @InputString VARCHAR(500)
)
RETURNS VARCHAR(500)
AS
BEGIN
    SELECT
        @InputString = REPLACE(@InputString, P.Symbol, '')
    FROM 
        Punctuation P

    RETURN @InputString
END
GO

Then you can just call the function in your UPDATE statement

UPDATE tblMyTable SET FieldName = dbo.fn_RemovePunctuation(FieldName)

I wanted to avoid creating a table and wanted to remove everything except letters and digits.

DECLARE @p int
DECLARE @Result Varchar(250)
DECLARE @BadChars Varchar(12)
SELECT @BadChars = '%[^a-z0-9]%'
-- to leave spaces - SELECT @BadChars = '%[^a-z0-9] %'

SET @Result = @InStr

SET @P =PatIndex(@BadChars,@Result)
WHILE @p > 0 BEGIN
    SELECT @Result = Left(@Result,@p-1) + Substring(@Result,@p+1,250)
    SET @P =PatIndex(@BadChars,@Result)
    END

I am proposing 2 solutions

Solution 1: Make a noise table and replace the noises with blank spaces

e.g.

DECLARE @String VARCHAR(MAX)
DECLARE @Noise TABLE(Noise VARCHAR(100),ReplaceChars VARCHAR(10))
SET @String = 'hello! how * > are % u (: . I am ok :). Oh nice!'

INSERT INTO @Noise(Noise,ReplaceChars)
SELECT '!',SPACE(1) UNION ALL SELECT '@',SPACE(1) UNION ALL
SELECT '#',SPACE(1) UNION ALL SELECT '$',SPACE(1) UNION ALL
SELECT '%',SPACE(1) UNION ALL SELECT '^',SPACE(1) UNION ALL
SELECT '&',SPACE(1) UNION ALL SELECT '*',SPACE(1) UNION ALL
SELECT '(',SPACE(1) UNION ALL SELECT ')',SPACE(1) UNION ALL
SELECT '{',SPACE(1) UNION ALL SELECT '}',SPACE(1) UNION ALL
SELECT '<',SPACE(1) UNION ALL SELECT '>',SPACE(1) UNION ALL
SELECT ':',SPACE(1)

SELECT @String = REPLACE(@String, Noise, ReplaceChars) FROM @Noise
SELECT @String Data

Solution 2: With a number table

DECLARE @String VARCHAR(MAX)
SET @String = 'hello! & how * > are % u (: . I am ok :). Oh nice!'

;with numbercte as
(
 select 1 as rn
 union all
 select rn+1 from numbercte where rn<LEN(@String)
)
select REPLACE(FilteredData,'&#x20;',SPACE(1)) Data from 
(select SUBSTRING(@String,rn,1) 
from numbercte  
where SUBSTRING(@String,rn,1) not in('!','*','>','<','%','(',')',':','!','&','@','#','$')

for xml path(''))X(FilteredData)

Output(Both the cases)

Data

hello  how   are  u  . I am ok . Oh nice

Note- I have just put some of the noises. You may need to put the noises that u need.

Hope this helps

You can use regular expressions in SQL Server - here is an article based on SQL 2005:

http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

I'd wrap it in a simple scalar UDF so all string cleaning is in one place if it's needed again.

Then you can use it on INSERT too...

If it's a one-off thing, I would use a C# + LINQ snippet in LINQPad to do the job with regular expressions.

It is quick and easy and you don't have to go through the process of setting up a CLR stored procedure and then cleaning up after yourself.

Can't you use PATINDEX to only include NUMBERS and LETTERS instead of trying to guess what punctuation might be in the field? (Not trying to be snarky, if I had the code ready, I'd share it...but this is what I'm looking for).

Seems like you need to create a custom function in order to avoid a giant list of replace functions in your queries - here's a good example:

http://www.codeproject.com/KB/database/SQLPhoneNumbersPart_2.aspx?display=Print

I took Ken MC's solution and made it into an function which can replace all punctuation with a given string:

----------------------------------------------------------------------------------------------------------------
-- This function replaces all punctuation in the given string with the "replaceWith" string
----------------------------------------------------------------------------------------------------------------
IF object_id('[dbo].[fnReplacePunctuation]') IS NOT NULL
BEGIN
    DROP FUNCTION [dbo].[fnReplacePunctuation];
END;
GO
CREATE FUNCTION [dbo].[fnReplacePunctuation] (@string NVARCHAR(MAX), @replaceWith NVARCHAR(max))
RETURNS NVARCHAR(MAX)
BEGIN
    DECLARE @Result Varchar(max) = @string;
    DECLARE @BadChars Varchar(12) = '%[^a-z0-9]%'; -- to leave spaces - SELECT @BadChars = '%[^a-z0-9] %'
    DECLARE @p int = PatIndex(@BadChars,@Result);
    DECLARE @searchFrom INT;
    DECLARE @indexOfPunct INT = @p;

    WHILE @indexOfPunct > 0 BEGIN
      SET @searchFrom = LEN(@Result) - @p;
      SET @Result = Left(@Result, @p-1) + @replaceWith + Substring(@Result, @p+1,LEN(@Result));
      SET @IndexOfPunct = PatIndex(@BadChars, substring(@Result, (LEN(@Result) - @SearchFrom)+1, LEN(@Result)));
      SET @p = (LEN(@Result) - @searchFrom) + @indexOfPunct;
    END
    RETURN @Result;
END;
GO
-- example:
SELECT dbo.fnReplacePunctuation('This is, only, a tést-really..', '');

Output:

Thisisonlyatéstreally
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!