问题
I have a site that wraps some user-generated content, and I want to be able to separate the markup for the layout, and the markup from the user-generated content, so the u-g content can't break the site layout.
The user-generated content is trusted, as it is coming from a known group of users on my network, but nonetheless only a small subset of html tags are allowed (p, ul/ol/li, em, strong, and a couple more). However, the user-generated content is not guaranteed to be well-formed, and we have had some instances of malformed user-generated content breaking the layout of the site.
We are working with our users to keep the content well-formed, but in the meantime I am trying to find a good way to separate the content from the layout. I have been looking into namespaces, but have been unable to find good documentation about CSS support for embedded namespaces.
Anyone have any good ideas?
EDIT
I have seen some really good suggestions here, but I should probably clarify that I have absolutely no control over the input mechanism that the users use. They are entering content into one system, and my page uses that system's API to pull content out of it. That system is using TinyMCE, but like I said, we are still getting some malformed content.
回答1:
Maybe overkill, but HTML Tidy could help if you can use it.
Use a WYSIWYG like TinyMCE or CKEditor that has built in cleanup methods.
Robert Koritnik's suggestion to use markdown seems brilliant, especially considering that you only allow a few harmless formatting tags.
I don't think there's anything you can do with CSS to stop layouts from breaking due to open HTML tags, so I would probably forget that idea.
回答2:
Why not use markdown
If your users are HTML literate or people that can grasp the concept of markdown syntax I suggest you rather go with that. Stackoverflow works great with it. I can't imagine having a usual rich editor on Stackoverflow. Markdown editors are much simpler and faster to use and provide enough formatting capabilities for most situations. If you need some special additional features you can always add those in but for starters oute of the box capabilities will suffice.
Real-time view for self validation
But don't forget to include a real time view of what users are writing. Self validation makes miracles so they correct their own mistakes before posting data.
回答3:
Instead of parsing the result or forcing the user to use a structured format, just display the content within an iframe:
<iframe id="user_html"></iframe>
<script>
document.getElementById("user_html").src = "data:text/html;charset=utf-8," + escape(content);
</script>
回答4:
I built custom CMS systems exclusively for several years and always had great luck with a combination of a quality WYSIWYG, strong front-end validation, and relentless back-end validation.
I always gravitate toward CKEditor because it's the only front-end editor that can deal with Microsoft Word output on the front end...that's a must-have in my books. Sure, others have a paste from word solution, but good luck getting users to use it. I've actually had a client overload a db insert thanks to Microsoft Word that didn't get scrubbed in Tiny. HTML tidy is a great solution to clean things up prior to validation on the back end.
CK has built-in templates and classes, so I used those to help my users format without going overboard. On the back-end I checked to ensure they hadn't tried any funny business with CSS, but it was never a concern with that group of users. Give them enough (safe) features and they'll never HAVE to go rogue.
来源:https://stackoverflow.com/questions/5859294/keep-user-generated-content-from-breaking-layout