I want to allow users to create tiny templates that I then render in Django with a predefined context. I am assuming the Django rendering is safe (I asked a question about t
"Use a markup language that produces safe HTML."
Clearly, the only sensible approach.
"The problem with this is that most markup languages are not very powerful layout-wise."
False.
"no way to center elements in ReST."
False.
Centering is a style -- a CSS feature -- not a markup feature.
The want to center is to assign an CSS Class to a piece of text. The .. class::
directive does this.
You can also define your own interpreted text role, if that's necessary for specifying an inline class on a piece of <span>
markup.
Seeing Pekka's answer, I tried to quickly Google an HTML Purifier equivalent in Python. Here's what I came up with: Python HTML Sanitizer. At first glance, it looks pretty good to me.
You are overlooking server side security issues. You need to be very careful that users can't use the templates import or include mechanism to access files they don't have permission to.
The bigger challenge is to prevent the template system from infinite loops and recursion. This is an obvious threat to system performance, but depending on the implementation and deployment setup, the server may never timeout. With a finite number of python threads at your disposal, repeated calls to a misbehaving template could quickly bring your site down.
There's PHP-Based HTML purifier, I have not used it myself yet but heard very good things about it. They promise a lot:
HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.
Maybe it's worth a try even though it's not Python based. Update: @Matchu has found a Python based alternative that looks good too.
You'll have a lot of very difficult edge cases, though, just think about Flash embeds. Plus, malicious uses of position: absolute
are extremely difficult to track down (there's position: relative
that could achieve the same effect, but also be a completely legitimate layout tool.) Maybe take a look at what - for example - EBay allow, and don't allow? If anybody has the necessary experience to know what's dangerous and what isn't from millions of examples, they do.
Related resources on EBay:
HTML & JavaScript with examples
Site Interference it's unclear, though, what is just forbidden, and what gets filtered
From what I found, they don't seem to publish their internal HTML blacklists, but output an error message if forbidden code is found. (Probably a wise move on their part, but unfortunate for the purposes of this question.)