How to find invalid Char in a SQL table

问题

This has been a major headache for the last couple of weeks. I have a largish table (165 columns x 11000+ rows). In this table, there are several comment columns that are set to varchar(max). There is one in particular that keeps getting invalid characters pasted into it by various users. This causes the reports in SSRS to fail. I then have to go and find these invalid characters and remove them. This has been a very painstaking time-consuming task.

What I would like to do is find a way to search automatically for these invalid characters and replace them with nothing. The problem is that I have no idea how to go about a search directly for these characters. Here is what they look like:

and here's another image of the same:

And here is what it looks like when I paste it into Notepad++:

I am not sure if it'll work and show up the way I see it, but here are the characters:

㹊潮Ņ᯸ࢹᖈư㹨ƶ槹鎤⻄ƺ綐ڌ⸀ƺ삸)䀤ƍ샄)Ņᛡ鎤ꗘᖃᒨ쬵Ğᘍ鎤ᐜᏰ>֔υ赸Ƹ쳰డ촜)鉀௿촜)쮜)Ἡ屰山舰霡ࣆ 耏Аం畠Ư놐ᓜતᏛ֔Ꮫ֨Ꮫ꯼ᓜƒ 邰఍厰ఆ邰఍드)抉鎤듄)繟Ĺ띨)᯸ࢹ䮸ࣉ᯸ࢹ䮸ࣉ샰)ԌƏŅ֐ᕄ홑Ņᛙ鎤ꗘᖃᒨ᯸ࢹ

They look like they are Chinese or something similar, but I've tried using Google Translate and it detects them as English.

Any help in figuring out a way to search for these? Create a Function or an SP would be fine as long as it works!

UPDATE

I've tried part of a solution I found here: How can I find Unicode/non-ASCII characters in an NTEXT field in a SQL Server 2005 table? and used this:

-- Start with tab, line feed, carriage return
declare @str varchar(1024)
set @str = '|' + char(9) + '|' + char(10) + '|' + char(13)

-- Add all normal ASCII characters (32 -> 127)
declare @i int
set @i = 32
while @i <= 127
    begin
    -- Uses | to escape, could be any character
    set @str = @str + '|' + char(@i)
    set @i = @i + 1
    end

select MEETING_NOTES from pmdb.TrackerData
where MEETING_NOTES like '%[^' + @str + ']%' escape '|'

But it is returning a lot more rows than it should be. I currently only have 1 row that has these invalid characters and it is returning 1708.

UPDATE 2

I have created a Function to try and remove all the invalid characters like this:

ALTER FUNCTION [dbo].[RemoveNonPrintable]
(
    @inputtext nvarchar(max) 
)
RETURNS nvarchar(max)
AS
BEGIN

    DECLARE @counter int = 1;
    DECLARE @colString nvarchar(1000)
    set @inputtext = REPLACE(@inputtext, char(0), '') -- 'NULL'
    set @inputtext = REPLACE(@inputtext, char(1), '') -- 'Start of Heading'
    set @inputtext = REPLACE(@inputtext, char(2), '') -- 'Start of Text'
    set @inputtext = REPLACE(@inputtext, char(3), '') -- 'End of Text'
    set @inputtext = REPLACE(@inputtext, char(4), '') -- 'End of Transmission'
    set @inputtext = REPLACE(@inputtext, char(5), '') -- 'Enquiry'
    set @inputtext = REPLACE(@inputtext, char(6), '') -- 'Acknowledgement'
    set @inputtext = REPLACE(@inputtext, char(7), '') -- 'Bell'
    set @inputtext = REPLACE(@inputtext, char(8), '') -- 'Backspace'
    set @inputtext = REPLACE(@inputtext, char(9), '') -- 'Horizontal Tab'
    -- replace line feed with blank, so words that were in different lines before are still separated
    set @inputtext = REPLACE(@inputtext, char(10), ' ') -- 'Line Feed'
    set @inputtext = REPLACE(@inputtext, char(11), '') -- 'Vertical Tab'
    set @inputtext = REPLACE(@inputtext, char(12), '') -- 'Form Feed'        
    -- replace carriage return with blank, so words that were in different lines before are still separated
    set @inputtext = REPLACE(@inputtext, char(13), ' ') -- 'Carriage Return'
    set @inputtext = REPLACE(@inputtext, char(14), '') -- 'Shift Out'
    set @inputtext = REPLACE(@inputtext, char(15), '') -- 'Shift In'
    set @inputtext = REPLACE(@inputtext, char(16), '') -- 'Data Link Escape'
    set @inputtext = REPLACE(@inputtext, char(17), '') -- 'Device Control 1'
    set @inputtext = REPLACE(@inputtext, char(18), '') -- 'Device Control 2'
    set @inputtext = REPLACE(@inputtext, char(19), '') -- 'Device Control 3'
    set @inputtext = REPLACE(@inputtext, char(20), '') -- 'Device Control 4'
    set @inputtext = REPLACE(@inputtext, char(21), '') -- 'Negative Acknowledgment'
    set @inputtext = REPLACE(@inputtext, char(22), '') -- 'Synchronous Idle'
    set @inputtext = REPLACE(@inputtext, char(23), '') -- 'End of Transmission Block'
    set @inputtext = REPLACE(@inputtext, char(24), '') -- 'Cancel'
    set @inputtext = REPLACE(@inputtext, char(25), '') -- 'End of Medium'
    set @inputtext = REPLACE(@inputtext, char(26), '') -- 'Substitute'
    set @inputtext = REPLACE(@inputtext, char(27), '') -- 'Escape'
    set @inputtext = REPLACE(@inputtext, char(28), '') -- 'File Separator'
    set @inputtext = REPLACE(@inputtext, char(29), '') -- 'Group Separator'
    set @inputtext = REPLACE(@inputtext, char(30), '') -- 'Record Separator'
    set @inputtext = REPLACE(@inputtext, char(31), '') -- 'Unit Separator'
    set @inputtext = REPLACE(@inputtext, char(127), '') -- 'Delete'

    set @colString = @inputtext

    WHILE @counter <= DATALENGTH(@colString)
       BEGIN
           set @colString   = REPLACE(@colString,isnull(NCHAR(UNICODE(SUBSTRING(@colString, @counter, 1))),'|'),'|')
           set @colString = REPLACE(@colString,'|','')

           SET @counter = @counter + 1
       END
    return @inputtext       
END

I call it like this:

BEGIN TRAN  --COMMIT    ROLLBACK
update pmdb.TrackerData
set CIRCUIT_COMMENTS = [dbo].[RemoveNonPrintable](CIRCUIT_COMMENTS),
    COE_COMMENTS = [dbo].[RemoveNonPrintable](COE_COMMENTS),
    MEETING_NOTES = [dbo].[RemoveNonPrintable](MEETING_NOTES),
    OSP_COMMENTS = [dbo].[RemoveNonPrintable](OSP_COMMENTS),
    COE_COMMENTS2 = [dbo].[RemoveNonPrintable](COE_COMMENTS2)

Then I run the code from the previous update to see if there is any difference. There is no difference. What gives? Am I doing this wrong?

EDIT 3

I've updated my function to have this:

    set @colString = @inputtext

    WHILE @counter <= DATALENGTH(@colString)
       BEGIN
           --set @colString = REPLACE(@colString,isnull(NCHAR(UNICODE(SUBSTRING(@colString, @counter, 1))),'|'),'|')
           --set @colString = REPLACE(@colString,'|','')
           if (UNICODE(SUBSTRING(@colString, @counter,1)) > 126)
           BEGIN
            SET @colString = REPLACE(@colString, CONVERT(nvarchar(1),(SUBSTRING(@colString, @counter,1))), CHAR(32))
           END
           ELSE IF(UNICODE(SUBSTRING(@colString, @counter, 1)) < 32)
           BEGIN
            SET @colString = REPLACE(@colString, CONVERT(nvarchar(1),(SUBSTRING(@colString, @counter,1))), CHAR(32))
           END

           set @inputtext = @colString

           SET @counter = @counter + 1
       END

It removes most of the invalid characters, but then it leaves others. I call it on a temp table that I created that holds the sample of invalid characters shown above like this:

update #Temp
set Notes = [dbo].[RemoveNonPrintable](Notes),
    Notes2 = [dbo].[RemoveNonPrintable](Notes2)

Then I'm left with the following in the two Notes:

Notes: ????N???u?z?????????)???)?N??????G????>???????)???)?)???????? ????U??????????  ???????)???)?L?)?????????)?????N???N???????

Notes2:     ᯸ࢹᖈ 㹨   ⻄   ⸀  )䀤  )  ᛡ ꗘᖃᒨ  ᘍ ᐜᏰ>֔      ) ௿ ) )Ἡ      ࣆ       ᓜ Ꮫ֔Ꮫ֨Ꮫ꯼ᓜ   ఍   ఍ )   )   )᯸ࢹ䮸ࣉ᯸ࢹ䮸ࣉ )Ԍ   ֐ᕄ  ᛙ ꗘᖃᒨ᯸ࢹ

Which is better than what I started with, but still not good enough.

回答1:

I have found a solution in another users question here

I modified it slightly though. What ends up working for me is this:

ALTER FUNCTION [dbo].[RemoveNonASCII] 
(
    -- Parameters
    @nstring nvarchar(max)
)
RETURNS varchar(max)
AS
BEGIN
    -- Variables
    DECLARE @Result varchar(max) = '',@nchar nvarchar(1), @position int

    -- T-SQL statements to compute the return value
    set @position = 1
    while @position <= LEN(@nstring)
    BEGIN
        set @nchar = SUBSTRING(@nstring, @position, 1)
        if UNICODE(@nchar) between 32 and 127
            set @Result = @Result + @nchar
        set @position = @position + 1
        set @Result = REPLACE(@Result,'))','')
        set @Result = REPLACE(@Result,'?','')
    END

    -- Return the result
    RETURN @Result

END

来源：https://stackoverflow.com/questions/44804879/how-to-find-invalid-char-in-a-sql-table

标签

sql-server

reporting-services

ssrs-2012