Convert special chars to HTML entities, without changing tags and parameters

前端 未结 3 1741
生来不讨喜
生来不讨喜 2021-01-16 18:57

I\'m using FreeTextBox editor to get some HTML created by users. The problem with this is this editor is not converting special chars in HTML entities at exception of \"<

相关标签:
3条回答
  • 2021-01-16 19:11

    After searching a lot, I've found that I was using the wrong property of the FreeTextBox component. The property was ConvertHtmlSymbolsToHtmlCodes wich has to be true.

    It also helps to use FormatHtmlTagsToXhtml if you need to insert your code into XHTML pages, because it uses a strong validation with tags parameters and quotes surrounding them.

    0 讨论(0)
  • 2021-01-16 19:14

    I would suggest parsing through each element using Linq to Xml and encoding the value of each element and attribute node. I'll try to come up with some code but hey it's 5pm on a Friday!

    0 讨论(0)
  • 2021-01-16 19:24

    If you've got a mixture of < meaning start a tag and < meaning a literal less-than sign, you can't possibly tell which is ‘a tag’ to ignore and which isn't.

    About all you could do would be to detect < usages that weren't a conventionally-formed start or end tag, using a nasty unreliable regex something like:

    <(?!\w+(\s+\w+="[^"<]*")*\s*/?>|/\w+\s*>)
    

    and replace them with &lt;. Similarly for & with &amp;:

    &(?!\w+;|#\d+;|#x[0-9A-Fa-f]+;)
    

    (> does not normally have to be escaped.)

    This won't allow every possible valid way of constructing elements, and it will allow broken mis-nested elements, and non-existent entities, and would mess up non-element constructs like comments. Because regex can't parse HTML, let alone HTML with added crunchy broken bits.

    So it's hardly foolproof. If you want proper markup that won't break your page when they accidentally leave a div open, the best first step is to parse it as XHTML and refuse it with an error if it's not well-formed XML.

    If you have a rich text editor component that generates output where a literal < is not escaped, then it's time to replace that component with something less appalling. But in general it's not a good idea to let users create HTML, because they're really rubbish at it. Plus allowing anyone to input HTML gives them complete control over wrecking the site and its security with JavaScript. A simpler text-markup language is often a win.

    0 讨论(0)
提交回复
热议问题