What's the best way to identify hidden characters in the result of a query in SQL Server (Query Analyzer)?

强颜欢笑 提交于 2019-11-28 05:46:10

Create a function that addresses all the whitespace possibilites and enable only those that seem appropriate:

SELECT dbo.ShowWhiteSpace(myfield) from mytable

Uncomment only those whitespace cases you want to test for:


CREATE FUNCTION dbo.ShowWhiteSpace (@str varchar(8000))
RETURNS varchar(8000)
AS
BEGIN
     DECLARE @ShowWhiteSpace varchar(8000);
     SET @ShowWhiteSpace = @str
     SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(32), '[?]')
     SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(13), '[CR]')
     SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(10), '[LF]')
     SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(9),  '[TAB]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(1),  '[SOH]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(2),  '[STX]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(3),  '[ETX]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(4),  '[EOT]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(5),  '[ENQ]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(6),  '[ACK]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(7),  '[BEL]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(8),  '[BS]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(11), '[VT]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(12), '[FF]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(14), '[SO]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(15), '[SI]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(16), '[DLE]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(17), '[DC1]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(18), '[DC2]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(19), '[DC3]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(20), '[DC4]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(21), '[NAK]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(22), '[SYN]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(23), '[ETB]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(24), '[CAN]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(25), '[EM]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(26), '[SUB]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(27), '[ESC]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(28), '[FS]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(29), '[GS]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(30), '[RS]')
--   SET @ShowWhiteSpace = REPLACE( @ShowWhiteSpace, CHAR(31), '[US]')
     RETURN(@ShowWhiteSpace)
END
gbn

To find them, you can use this

;WITH cte AS
(
   SELECT 0 AS CharCode
   UNION ALL
   SELECT CharCode + 1 FROM cte WHERE CharCode <31
)
SELECT
   *
FROM
   mytable T
     cross join cte
WHERE
   EXISTS (SELECT *
        FROM mytable Tx
        WHERE Tx.PKCol = T.PKCol
             AND
              Tx.MyField LIKE '%' + CHAR(cte.CharCode) + '%'
         )

Replacing the EXISTS with a JOIN will allow you to REPLACE them, but you'll get multiple rows... I can't think of a way around that...

They way I did it was by selecting all of the data

select * from myTable and then right-clicking on the result set and chose "Save results as..." a csv file.

Opening the csv file in Notepad++ I saw the LF characters not visible in SQL Server result set.

You can always use the DATALENGTH Function to determine if you have extra white space characters in text fields. This won't make the text visible but will show you where there are extra white space characters.

    SELECT DATALENGTH('MyTextData ') AS BinaryLength, LEN('MyTextData ') AS TextLength

This will produce 11 for BinaryLength and 10 for TextLength.

In a table your SQL would like this:

    SELECT * 
    FROM tblA
    WHERE DATALENGTH(MyTextField) > LEN(MyTextField)

This function is usable in all versions of SQL Server beginning with 2005.

select myfield, CAST(myfield as varbinary(max)) ...

I have faced the same problem with a character that I never managed to match with a where query - CHARINDEX, LIKE, REPLACE, etc. did not work. Then I have used a brute force solution which is awful, heavy but works:

Step 1: make a copy of the complete data set - keep track of the original names with an source_id referencing the pk of the source table (and keep this source id in all the subsequent tables). Step 2: LTRIM RTRIM the data, and replace all double spaces, tab, etc (basically all the CHAR(1) to CHAR(32) by one space. Lowercase the whole set as well. Step 3: replace all the special characters that you know (get the list of all the quotes, double quotes, etc.) by something from a-z (I suggest z). Basically replace everything that is not standard English characters by a z (using nested REPLACE of REPLACE in a loop). Step 4: split by word into a second copy, where each word is in a separate row - the split is a SUBSTRING based on the position of the space characters - at this point, we should miss the ones where there's a hidden space that we did not catche earlier. Step 5: split each word into a third copy, where each letter is in a separate row (I know it makes a very large table) - keep track of the charindex of each letter in a separate column. Step 6: Select everything in the above table which is not LIKE [a-z]. This is the list of the unidentified characters we want to exclude.

From the output of step 6 we have enough data to make a series of substring of the source to select everything but the unknown character we want to exclude.

Note 1: there are smart ways to optimize this, depending on the size of the original expression (steps 4, 5 and 6 can be made in one go).

Note 2: this is not very fast, but the fastest way to get this done for a large data set, because the split of lines into words and words into letters is made by substring, which slices all the table into one character slices. However, this is quite heavy to build. With a smaller set, it may be enough to parse each record one by one and search for character which is not in a list of all English characters plus all special characters.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!