问题
I am trying to convert the HTML names like & "
etc to their equivalent CHAR
values using the SQL below. I was testing this in SQL Server 2012.
Test 1 (This works fine):
GO
DECLARE @inputString VARCHAR(MAX)= '&testString&'
DECLARE @codePos INT, @codeEncoded VARCHAR(7), @startIndex INT, @resultString varchar(max)
SET @resultString = LTRIM(RTRIM(@inputString))
SELECT @startIndex = PATINDEX('%&%', @resultString)
WHILE @startIndex > 0
BEGIN
SELECT @resultString = REPLACE(@resultString, '&', '&'), @startIndex=PATINDEX('%&%', @resultString)
END
PRINT @resultString
Go
Output:
&testString&
Test 2 (this isn't worked): Since the above worked, I have tried to extend this to deal with more characters as following:
DECLARE @htmlNames TABLE (ID INT IDENTITY(1,1), asciiDecimal INT, htmlName varchar(50))
INSERT INTO @htmlNames
VALUES (34,'"'),(38,'&'),(60,'<'),(62,'>'),(160,' '),(161,'¡'),(162,'¢')
-- I would load the full list of HTML names into this TABLE varaible, but removed for testing purposes
DECLARE @inputString VARCHAR(MAX)= '&testString&'
DECLARE @count INT = 0
DECLARE @id INT = 1
DECLARE @charCode INT, @htmlName VARCHAR(30)
DECLARE @codePos INT, @codeEncoded VARCHAR(7), @startIndex INT
, @resultString varchar(max)
SELECT @count=COUNT(*) FROM @htmlNames
WHILE @id <=@count
BEGIN
SELECT @charCode = asciiDecimal, @htmlname = htmlName
FROM @htmlNames
WHERE ID = @id
SET @resultString = LTRIM(RTRIM(@inputString))
SELECT @startIndex = PATINDEX('%' + @htmlName + '%', @resultString)
While @startIndex > 0
BEGIN
--PRINT @resultString + '|' + @htmlName + '|' + NCHAR(@charCode)
SELECT @resultString = REPLACE(@resultString, @htmlName, NCHAR(@charCode))
SET @startIndex=PATINDEX('%' + @htmlName + '%', @resultString)
END
SET @id=@id + 1
END
PRINT @resultString
GO
Output:
&testString&
I cannot figure out where I'm going wrong? Any help would be much appreciated.
I am not interested to load the string values into application layer and then apply HTMLDecode
and save back to the database.
EDIT:
This line SET @resultString = LTRIM(RTRIM(@inputString))
was inside the WHILE
so I was overwriting the result with @inputString
. Thank you, YanireRomero.
I like @RichardDeeming's solution too, but it didn't suit my needs in this case.
回答1:
Here's a simpler solution that doesn't need a loop:
DECLARE @htmlNames TABLE
(
ID INT IDENTITY(1,1),
asciiDecimal INT,
htmlName varchar(50)
);
INSERT INTO @htmlNames
VALUES
(34,'"'),
(38,'&'),
(60,'<'),
(62,'>'),
(160,' '),
(161,'¡'),
(162,'¢')
;
DECLARE @inputString varchar(max)= '&test&quot;<String>"&';
DECLARE @resultString varchar(max) = @inputString;
-- Simple HTML-decode:
SELECT
@resultString = Replace(@resultString COLLATE Latin1_General_CS_AS, htmlName, NCHAR(asciiDecimal))
FROM
@htmlNames
;
SELECT @resultString;
-- Output: &test"<String>"&
-- Multiple HTML-decode:
SET @resultString = @inputString;
DECLARE @temp varchar(max) = '';
WHILE @resultString != @temp
BEGIN
SET @temp = @resultString;
SELECT
@resultString = Replace(@resultString COLLATE Latin1_General_CS_AS, htmlName, NCHAR(asciiDecimal))
FROM
@htmlNames
;
END;
SELECT @resultString;
-- Output: &test"<String>"&
EDIT: Changed to NCHAR
, as suggested by @tomasofen, and added a case-sensitive collation to the REPLACE
function, as suggested by @TechyGypo.
回答2:
For the sake of performance, this isn't something you should do write as T-SQL statements, or as a SQL scalar value function. The .NET libraries provide excellent, fast, and, above all, reliable HTML decoding. In my opinion, you should implement this as a SQL CLR, like this:
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.Net;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction(
IsDeterministic = true,
IsPrecise = true,
DataAccess = DataAccessKind.None,
SystemDataAccess = SystemDataAccessKind.None)]
[return: SqlFacet(MaxSize = 4000)]
public static SqlString cfnHtmlDecode([SqlFacet(MaxSize = 4000)] SqlString input)
{
if (input.IsNull)
return null;
return System.Net.WebUtility.HtmlDecode(input.Value);
}
}
Then in your T-SQL, call it like this:
SELECT clr_schema.cfnHtmlDecode(column_name) FROM table_schema.table_name
回答3:
Hey it was an assign error:
DECLARE @htmlNames TABLE (ID INT IDENTITY(1,1), asciiDecimal INT, htmlName varchar(50))
INSERT INTO @htmlNames
VALUES (34,'"'),(38,'&'),(60,'<'),(62,'>'),(160,' '),(161,'¡'),(162,'¢')
-- I would load the full list of HTML names into this TABLE varaible, but removed for testing purposes
DECLARE @inputString VARCHAR(MAX)= '&testString&'
DECLARE @count INT = 0
DECLARE @id INT = 1
DECLARE @charCode INT, @htmlName VARCHAR(30)
DECLARE @codePos INT, @codeEncoded VARCHAR(7), @startIndex INT
, @resultString varchar(max)
SELECT @count=COUNT(*) FROM @htmlNames
SET @resultString = LTRIM(RTRIM(@inputString))
WHILE @id <=@count
BEGIN
SELECT @charCode = asciiDecimal, @htmlname = htmlName
FROM @htmlNames
WHERE ID = @id
SELECT @startIndex = PATINDEX('%' + @htmlName + '%', @resultString)
While @startIndex > 0
BEGIN
--PRINT @resultString + '|' + @htmlName + '|' + NCHAR(@charCode)
SET @resultString = REPLACE(@resultString, @htmlName, NCHAR(@charCode))
SET @startIndex=PATINDEX('%' + @htmlName + '%', @resultString)
END
SET @id=@id + 1
END
PRINT @resultString
GO
this line SET @resultString = LTRIM(RTRIM(@inputString)) was inside the while so you were overwriting you result.
Hope it helps.
回答4:
Some additional help for "Richard Deeming" response, to safe some typing for future visitors trying to upgrade the function with more codes:
INSERT INTO @htmlNames
VALUES
(34,'"'),
(38,'&'),
(60,'<'),
(62,'>'),
(160, ' '),
(161, '¡'),
(162, '¢'),
(163, '£'),
(164, '¤'),
(165, '¥'),
(166, '¦'),
(167, '§'),
(168, '¨'),
(169, '©'),
(170, 'ª'),
(171, '«'),
(172, '¬'),
(173, '­'),
(174, '®'),
(175, '¯'),
(176, '°'),
(177, '±'),
(178, '²'),
(179, '³'),
(180, '´'),
(181, 'µ'),
(182, '¶'),
(183, '·'),
(184, '¸'),
(185, '¹'),
(186, 'º'),
(187, '»'),
(188, '¼'),
(189, '½'),
(190, '¾'),
(191, '¿'),
(192, 'À'),
(193, 'Á'),
(194, 'Â'),
(195, 'Ã'),
(196, 'Ä'),
(197, 'Å'),
(198, 'Æ'),
(199, 'Ç'),
(200, 'È'),
(201, 'É'),
(202, 'Ê'),
(203, 'Ë'),
(204, 'Ì'),
(205, 'Í'),
(206, 'Î'),
(207, 'Ï'),
(208, 'Ð'),
(209, 'Ñ'),
(210, 'Ò'),
(211, 'Ó'),
(212, 'Ô'),
(213, 'Õ'),
(214, 'Ö'),
(215, '×'),
(216, 'Ø'),
(217, 'Ù'),
(218, 'Ú'),
(219, 'Û'),
(220, 'Ü'),
(221, 'Ý'),
(222, 'Þ'),
(223, 'ß'),
(224, 'à'),
(225, 'á'),
(226, 'â'),
(227, 'ã'),
(228, 'ä'),
(229, 'å'),
(230, 'æ'),
(231, 'ç'),
(232, 'è'),
(233, 'é'),
(234, 'ê'),
(235, 'ë'),
(236, 'ì'),
(237, 'í'),
(238, 'î'),
(239, 'ï'),
(240, 'ð'),
(241, 'ñ'),
(242, 'ò'),
(243, 'ó'),
(244, 'ô'),
(245, 'õ'),
(246, 'ö'),
(247, '÷'),
(248, 'ø'),
(249, 'ù'),
(250, 'ú'),
(251, 'û'),
(252, 'ü'),
(253, 'ý'),
(254, 'þ'),
(255, 'ÿ'),
(8364, '€');
EDITED:
If you want the euro symbol working (and in general ASCII codes over 255), you will need to use NCHAR instead CHAR in Richard Deeming code.
来源:https://stackoverflow.com/questions/25432805/sql-server-html-decode-based-on-the-html-names-in-a-string-input