Any one know a good way to remove punctuation from a field in SQL Server?
I\'m thinking
UPDATE tblMyTable SET FieldName = REPLACE(REPLACE(REPLACE(FieldNa
You can use regular expressions in SQL Server - here is an article based on SQL 2005:
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Can't you use PATINDEX to only include NUMBERS and LETTERS instead of trying to guess what punctuation might be in the field? (Not trying to be snarky, if I had the code ready, I'd share it...but this is what I'm looking for).
Seems like you need to create a custom function in order to avoid a giant list of replace functions in your queries - here's a good example:
http://www.codeproject.com/KB/database/SQLPhoneNumbersPart_2.aspx?display=Print
I am proposing 2 solutions
Solution 1: Make a noise table and replace the noises with blank spaces
e.g.
DECLARE @String VARCHAR(MAX)
DECLARE @Noise TABLE(Noise VARCHAR(100),ReplaceChars VARCHAR(10))
SET @String = 'hello! how * > are % u (: . I am ok :). Oh nice!'
INSERT INTO @Noise(Noise,ReplaceChars)
SELECT '!',SPACE(1) UNION ALL SELECT '@',SPACE(1) UNION ALL
SELECT '#',SPACE(1) UNION ALL SELECT '$',SPACE(1) UNION ALL
SELECT '%',SPACE(1) UNION ALL SELECT '^',SPACE(1) UNION ALL
SELECT '&',SPACE(1) UNION ALL SELECT '*',SPACE(1) UNION ALL
SELECT '(',SPACE(1) UNION ALL SELECT ')',SPACE(1) UNION ALL
SELECT '{',SPACE(1) UNION ALL SELECT '}',SPACE(1) UNION ALL
SELECT '<',SPACE(1) UNION ALL SELECT '>',SPACE(1) UNION ALL
SELECT ':',SPACE(1)
SELECT @String = REPLACE(@String, Noise, ReplaceChars) FROM @Noise
SELECT @String Data
Solution 2: With a number table
DECLARE @String VARCHAR(MAX)
SET @String = 'hello! & how * > are % u (: . I am ok :). Oh nice!'
;with numbercte as
(
select 1 as rn
union all
select rn+1 from numbercte where rn<LEN(@String)
)
select REPLACE(FilteredData,' ',SPACE(1)) Data from
(select SUBSTRING(@String,rn,1)
from numbercte
where SUBSTRING(@String,rn,1) not in('!','*','>','<','%','(',')',':','!','&','@','#','$')
for xml path(''))X(FilteredData)
Output(Both the cases)
Data
hello how are u . I am ok . Oh nice
Note- I have just put some of the noises. You may need to put the noises that u need.
Hope this helps
I wanted to avoid creating a table and wanted to remove everything except letters and digits.
DECLARE @p int
DECLARE @Result Varchar(250)
DECLARE @BadChars Varchar(12)
SELECT @BadChars = '%[^a-z0-9]%'
-- to leave spaces - SELECT @BadChars = '%[^a-z0-9] %'
SET @Result = @InStr
SET @P =PatIndex(@BadChars,@Result)
WHILE @p > 0 BEGIN
SELECT @Result = Left(@Result,@p-1) + Substring(@Result,@p+1,250)
SET @P =PatIndex(@BadChars,@Result)
END
I'd wrap it in a simple scalar UDF so all string cleaning is in one place if it's needed again.
Then you can use it on INSERT too...
Ideally, you would do this in an application language such as C# + LINQ as mentioned above.
If you wanted to do it purely in T-SQL though, one way make things neater would be to firstly create a table that held all the punctuation you wanted to removed.
CREATE TABLE Punctuation
(
Symbol VARCHAR(1) NOT NULL
)
INSERT INTO Punctuation (Symbol) VALUES('''')
INSERT INTO Punctuation (Symbol) VALUES('-')
INSERT INTO Punctuation (Symbol) VALUES('.')
Next, you could create a function in SQL to remove all the punctuation symbols from an input string.
CREATE FUNCTION dbo.fn_RemovePunctuation
(
@InputString VARCHAR(500)
)
RETURNS VARCHAR(500)
AS
BEGIN
SELECT
@InputString = REPLACE(@InputString, P.Symbol, '')
FROM
Punctuation P
RETURN @InputString
END
GO
Then you can just call the function in your UPDATE statement
UPDATE tblMyTable SET FieldName = dbo.fn_RemovePunctuation(FieldName)