Best Practice: User generated HTML cleaning

前端 未结 4 1163
旧巷少年郎
旧巷少年郎 2021-01-02 22:47

I\'m coding a WYSIWYG editor width designMode=\"on\" on a iframe. The editor works fine and i store the code as is in the database.

Before outputing the html i need

相关标签:
4条回答
  • 2021-01-02 23:09

    I looked into the same question recently with Perl as the server-side language.

    While doing so I ran into HTML Purifier which may be what you want. But obviously as it's in PHP and not Perl, I didn't actually test it out.

    Also, in my research I came to the conclusion that this is a very tricky business and consider if possible using a simplified markup language like Markdown, as suggested by Hank Gay.

    0 讨论(0)
  • 2021-01-02 23:09

    If you are familiar with ASP .NET, just perform a Server.htmlencode() to convert special characters like < > to "& g t;" "&l t ;"

    In php, you can use htmlspecialchars() functions.

    Once the special characters are encoded, cross-site-scripting can be prevented.

    0 讨论(0)
  • 2021-01-02 23:29

    The best practice is to allow only certain things you know aren't dangerous, and remove/escape all the rest. See the paper Automated Malicious Code Detection and Removal on the Web (OWASP AntiSamy) for a discussion on this (the library is for Java, but the principles apply for any language).

    0 讨论(0)
  • 2021-01-02 23:31

    If you're really bent on allowing this, you should use a white list approach.

    The best approach is probably to disallow HTML and use a simplified markup format instead; you can pre-render to HTML and store that in the database if performance is a concern. Avoiding these sorts of problems is one of the big reasons for using Markdown, Textile, reStructuredText, etc.

    NOTE: I linked to GitHub-Flavored Markdown (GFM), not Standard Markdown (SM). GFM addresses some common problems that end-users have with SM.

    0 讨论(0)
提交回复
热议问题