C# HtmlDecode Specific tags only

后端 未结 3 1903
忘了有多久
忘了有多久 2021-01-17 00:53

I have a large htmlencoded string and i want decode only specific whitelisted html tags.

Is there a way to do this in c#, WebUtility.HtmlDecode() decodes everything.

相关标签:
3条回答
  • 2021-01-17 01:35

    A better approach could be to use some html parser like Agilitypack or csquery or Nsoup to find specific elements and decode it in a loop.

    check this for links and examples of parsers

    Check It, i did it using csquery :

    string input = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
    string output = "&lt;span&gt;i am <strong color=blue>very</strong> big <br>man.&lt;/span&gt;";
    
    var decoded = HttpUtility.HtmlDecode(output);
    var encoded =input ; //  HttpUtility.HtmlEncode(decoded);
    
    Console.WriteLine(encoded);
    Console.WriteLine(decoded);
    
    var doc=CsQuery.CQ.CreateDocument(decoded);
    
    var paras=doc.Select("strong").Union(doc.Select ("br")) ;
    
    var tags=new List<KeyValuePair<string, string>>();
    var counter=0;
    
    foreach (var element in paras)
    {
        HttpUtility.HtmlEncode(element.OuterHTML).Dump();
        var key ="---" + counter + "---";
        var value= HttpUtility.HtmlDecode(element.OuterHTML);
        var pair= new KeyValuePair<String,String>(key,value);
    
        element.OuterHTML = key ;
        tags.Add(pair);
        counter++;
    }
    
    var finalstring= HttpUtility.HtmlEncode(doc.Document.Body.InnerHTML);
    finalstring.Dump();
    
    foreach (var element in tags)
    {
    finalstring=finalstring.Replace(element.Key,element.Value);
    }
    
    Console.WriteLine(finalstring);
    
    0 讨论(0)
  • 2021-01-17 01:40

    You could do something like this

    public string DecodeSpecificTags(List<string> whiteListedTagNames,string encodedInput)
    {
        String regex="";
        foreach(string s in whiteListedTagNames)
        {
            regex="&lt;"+@"\s*/?\s*"+s+".*?"+"&gt;";
            encodedInput=Regex.Replace(encodedInput,regex);
        }
        return encodedInput;
    }
    
    0 讨论(0)
  • 2021-01-17 01:48

    Or you could use HtmlAgility with a black list or white list based on your requirement. I'm using black listed approach. My black listed tag is store in a text file, for example "script|img"

    public static string DecodeSpecificTags(this string content, List<string> blackListedTags)
        {
            if (string.IsNullOrEmpty(content))
            {
                return content;
            }
            blackListedTags = blackListedTags.Select(t => t.ToLowerInvariant()).ToList();
            var decodedContent = HttpUtility.HtmlDecode(content);
            var document = new HtmlDocument();
            document.LoadHtml(decodedContent);
            decodedContent = blackListedTags.Select(blackListedTag => document.DocumentNode.Descendants(blackListedTag))
                    .Aggregate(decodedContent,
                        (current1, nodes) =>
                            nodes.Select(htmlNode => htmlNode.WriteTo())
                                .Aggregate(current1,
                                    (current, nodeContent) =>
                                        current.Replace(nodeContent, HttpUtility.HtmlEncode(nodeContent))));
            return decodedContent;
        }
    
    0 讨论(0)
提交回复
热议问题