SQL Server: HTML Decode based on the HTML names in a String input

五迷三道 提交于 2021-02-07 10:15:28

问题


I am trying to convert the HTML names like & " etc to their equivalent CHAR values using the SQL below. I was testing this in SQL Server 2012.

Test 1 (This works fine):

GO
DECLARE @inputString VARCHAR(MAX)= '&testString&'
DECLARE @codePos INT, @codeEncoded VARCHAR(7), @startIndex INT, @resultString varchar(max)
SET @resultString = LTRIM(RTRIM(@inputString))
SELECT @startIndex = PATINDEX('%&%', @resultString)
WHILE @startIndex > 0 
BEGIN
    SELECT @resultString = REPLACE(@resultString, '&', '&'), @startIndex=PATINDEX('%&%', @resultString)
END

PRINT @resultString
Go

Output:

&testString&

Test 2 (this isn't worked): Since the above worked, I have tried to extend this to deal with more characters as following:

DECLARE @htmlNames TABLE (ID INT IDENTITY(1,1), asciiDecimal INT, htmlName varchar(50))
INSERT INTO @htmlNames
VALUES (34,'"'),(38,'&'),(60,'<'),(62,'>'),(160,' '),(161,'¡'),(162,'¢')
-- I would load the full list of HTML names into this TABLE varaible, but removed for testing purposes
DECLARE @inputString VARCHAR(MAX)= '&testString&'
DECLARE @count INT = 0
DECLARE @id INT = 1
DECLARE @charCode INT, @htmlName VARCHAR(30)
DECLARE @codePos INT, @codeEncoded VARCHAR(7), @startIndex INT
        , @resultString varchar(max)
SELECT @count=COUNT(*) FROM @htmlNames

WHILE @id <=@count
BEGIN
    SELECT @charCode = asciiDecimal, @htmlname = htmlName
    FROM @htmlNames
    WHERE ID = @id

        SET @resultString = LTRIM(RTRIM(@inputString))
        SELECT @startIndex = PATINDEX('%' + @htmlName + '%', @resultString)
        While @startIndex > 0 
        BEGIN
            --PRINT @resultString + '|'  + @htmlName + '|' + NCHAR(@charCode)
            SELECT @resultString = REPLACE(@resultString, @htmlName, NCHAR(@charCode))
            SET @startIndex=PATINDEX('%' + @htmlName + '%', @resultString)
        END
        SET @id=@id + 1
END

PRINT @resultString

GO

Output:

&amp;testString&amp;

I cannot figure out where I'm going wrong? Any help would be much appreciated.

I am not interested to load the string values into application layer and then apply HTMLDecode and save back to the database.

EDIT: This line SET @resultString = LTRIM(RTRIM(@inputString)) was inside the WHILE so I was overwriting the result with @inputString. Thank you, YanireRomero.

I like @RichardDeeming's solution too, but it didn't suit my needs in this case.


回答1:


Here's a simpler solution that doesn't need a loop:

DECLARE @htmlNames TABLE 
(
    ID INT IDENTITY(1,1), 
    asciiDecimal INT, 
    htmlName varchar(50)
);

INSERT INTO @htmlNames 
VALUES 
    (34,'&quot;'),
    (38,'&amp;'),
    (60,'&lt;'),
    (62,'&gt;'),
    (160,'&nbsp;'),
    (161,'&iexcl;'),
    (162,'&cent;')
;

DECLARE @inputString varchar(max)= '&amp;test&amp;quot;&lt;String&gt;&quot;&amp;';
DECLARE @resultString varchar(max) = @inputString;

-- Simple HTML-decode:
SELECT
    @resultString = Replace(@resultString COLLATE Latin1_General_CS_AS, htmlName, NCHAR(asciiDecimal))
FROM
    @htmlNames
;

SELECT @resultString;
-- Output: &test&quot;<String>"&


-- Multiple HTML-decode:
SET @resultString = @inputString;

DECLARE @temp varchar(max) = '';
WHILE @resultString != @temp
BEGIN
    SET @temp = @resultString;

    SELECT
        @resultString = Replace(@resultString COLLATE Latin1_General_CS_AS, htmlName, NCHAR(asciiDecimal))
    FROM
        @htmlNames
    ;
END;

SELECT @resultString;
-- Output: &test"<String>"&

EDIT: Changed to NCHAR, as suggested by @tomasofen, and added a case-sensitive collation to the REPLACE function, as suggested by @TechyGypo.




回答2:


For the sake of performance, this isn't something you should do write as T-SQL statements, or as a SQL scalar value function. The .NET libraries provide excellent, fast, and, above all, reliable HTML decoding. In my opinion, you should implement this as a SQL CLR, like this:

using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.Net;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction(
        IsDeterministic = true,
        IsPrecise = true,
        DataAccess = DataAccessKind.None,
        SystemDataAccess = SystemDataAccessKind.None)]
    [return: SqlFacet(MaxSize = 4000)]
    public static SqlString cfnHtmlDecode([SqlFacet(MaxSize = 4000)] SqlString input)
    {
        if (input.IsNull)
            return null;

        return System.Net.WebUtility.HtmlDecode(input.Value);
    }
}

Then in your T-SQL, call it like this:

SELECT clr_schema.cfnHtmlDecode(column_name) FROM table_schema.table_name



回答3:


Hey it was an assign error:

DECLARE @htmlNames TABLE (ID INT IDENTITY(1,1), asciiDecimal INT, htmlName varchar(50))
INSERT INTO @htmlNames
VALUES (34,'&quot;'),(38,'&amp;'),(60,'&lt;'),(62,'&gt;'),(160,'&nbsp;'),(161,'&iexcl;'),(162,'&cent;')
-- I would load the full list of HTML names into this TABLE varaible, but removed for testing purposes
DECLARE @inputString VARCHAR(MAX)= '&amp;testString&amp;'
DECLARE @count INT = 0
DECLARE @id INT = 1
DECLARE @charCode INT, @htmlName VARCHAR(30)
DECLARE @codePos INT, @codeEncoded VARCHAR(7), @startIndex INT
    , @resultString varchar(max)
SELECT @count=COUNT(*) FROM @htmlNames

SET @resultString = LTRIM(RTRIM(@inputString))

WHILE @id <=@count
BEGIN

    SELECT @charCode = asciiDecimal, @htmlname = htmlName
    FROM @htmlNames
    WHERE ID = @id

        SELECT @startIndex = PATINDEX('%' + @htmlName + '%', @resultString)

        While @startIndex > 0 
        BEGIN
            --PRINT @resultString + '|'  + @htmlName + '|' + NCHAR(@charCode)
            SET @resultString = REPLACE(@resultString, @htmlName, NCHAR(@charCode))
            SET @startIndex=PATINDEX('%' + @htmlName + '%', @resultString)
        END
        SET @id=@id + 1
END

PRINT @resultString

GO

this line SET @resultString = LTRIM(RTRIM(@inputString)) was inside the while so you were overwriting you result.

Hope it helps.




回答4:


Some additional help for "Richard Deeming" response, to safe some typing for future visitors trying to upgrade the function with more codes:

INSERT INTO @htmlNames 
    VALUES 
        (34,'&quot;'),
        (38,'&amp;'),
        (60,'&lt;'),
        (62,'&gt;'),

(160, '&nbsp;'),
(161, '&iexcl;'),
(162, '&cent;'),
(163, '&pound;'),
(164, '&curren;'),
(165, '&yen;'),
(166, '&brvbar;'),
(167, '&sect;'),
(168, '&uml;'),
(169, '&copy;'),
(170, '&ordf;'),
(171, '&laquo;'),
(172, '&not;'),
(173, '&shy;'),
(174, '&reg;'),
(175, '&macr;'),

(176, '&deg;'),
(177, '&plusmn;'),
(178, '&sup2;'),
(179, '&sup3;'),
(180, '&acute;'),
(181, '&micro;'),
(182, '&para;'),
(183, '&middot;'),
(184, '&cedil;'),
(185, '&sup1;'),
(186, '&ordm;'),
(187, '&raquo;'),
(188, '&frac14;'),
(189, '&frac12;'),
(190, '&frac34;'),
(191, '&iquest;'),

(192, '&Agrave;'),
(193, '&Aacute;'),
(194, '&Acirc;'),
(195, '&Atilde;'),
(196, '&Auml;'),
(197, '&Aring;'),
(198, '&AElig;'),
(199, '&Ccedil;'),
(200, '&Egrave;'),
(201, '&Eacute;'),
(202, '&Ecirc;'),
(203, '&Euml;'),
(204, '&Igrave;'),
(205, '&Iacute;'),
(206, '&Icirc;'),
(207, '&Iuml;'),

(208, '&ETH;'),
(209, '&Ntilde;'),
(210, '&Ograve;'),
(211, '&Oacute;'),
(212, '&Ocirc;'),
(213, '&Otilde;'),
(214, '&Ouml;'),
(215, '&times;'),
(216, '&Oslash;'),
(217, '&Ugrave;'),
(218, '&Uacute;'),
(219, '&Ucirc;'),
(220, '&Uuml;'),
(221, '&Yacute;'),
(222, '&THORN;'),
(223, '&szlig;'),

(224, '&agrave;'),
(225, '&aacute;'),
(226, '&acirc;'),
(227, '&atilde;'),
(228, '&auml;'),
(229, '&aring;'),
(230, '&aelig;'),
(231, '&ccedil;'),
(232, '&egrave;'),
(233, '&eacute;'),
(234, '&ecirc;'),
(235, '&euml;'),
(236, '&igrave;'),
(237, '&iacute;'),
(238, '&icirc;'),
(239, '&iuml;'),

(240, '&eth;'),
(241, '&ntilde;'),
(242, '&ograve;'),
(243, '&oacute;'),
(244, '&ocirc;'),
(245, '&otilde;'),
(246, '&ouml;'),
(247, '&divide;'),
(248, '&oslash;'),
(249, '&ugrave;'),
(250, '&uacute;'),
(251, '&ucirc;'),
(252, '&uuml;'),
(253, '&yacute;'),
(254, '&thorn;'),
(255, '&yuml;'),
(8364, '&euro;');

EDITED:

If you want the euro symbol working (and in general ASCII codes over 255), you will need to use NCHAR instead CHAR in Richard Deeming code.



来源:https://stackoverflow.com/questions/25432805/sql-server-html-decode-based-on-the-html-names-in-a-string-input

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!