How to not transform special characters to html entities with owasp antisamy

陌路散爱 提交于 2019-12-09 07:33:25

问题


I use Owasp Anti samy with Ebay policy file to prevent XSS attacks on my website.

I also use Hibernate search to index my objects.

When I use this code:

String html = "special word: été";    

// use the Ebay configuration file    
Policy policy = Policy.getInstance(xssPolicyFile.getInputStream());

AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(html, policy);

// result is now : "special word: été"
result = cr.getCleanHTML();

As you can see all chars "é" has been transformed to their html entity equivalent "é"

My page is on UTF-8, so I don't need this transformation. Moreover, when I index this text with Hibernate Search, it indexes the word with html entities, so I can't find word "été" on my index.

How can I force antisamy to not transform special chars to their html entity equivalent ?

thanks

PS: an issue has been opened : http://code.google.com/p/owaspantisamy/issues/detail?id=99


回答1:


I ran into the same problem this morning.

I have encapsulated antisamy in a class and I use apache StringEscapeUtil from apache common-lang to restore special characters.

 CleanResults cleanResults = antiSamy.scan(taintedHtml);
 cleanedHtml = cleanResults.getCleanHTML();  
 return StringEscapeUtils.unescapeHtml(cleanedHtml)

The result is a cleaned up HTML without the HTML escaping of special characters.

Hope this helps.




回答2:


Like Mohamad said it in a comment, Antisamy has just released a new directive named : entityEncodeIntlChars

here is the detail : http://code.google.com/p/owaspantisamy/source/detail?r=240

It seems that this directive solves the problem.




回答3:


After scouring the AntiSamy source code, I found no way of changing this behavior apart from modifying AntiSamy.




回答4:


Check out this one: http://code.google.com/p/owaspantisamy/source/browse/#svn/trunk/dotNet/current/source/owaspantisamy/html/scan

Grab the source and notice that key classes (AntiSamyDOMScanner, CleanResults) use standard framework objects (like XmlDocument). Compile and run with the binary you compiled - so that you can see everything in a debugger - as in which of the major classes actually corrupts your data. With that in hand you'll be able to either change a few properties on major objects to make it stop or inject your own post-processing to revert the wrongdoing (say with a regexp). Latter you can expose that as additional top-level property, say one named NoMess :-)

Chances are that behavior in that respect is different between languages (there's 3 in that trunk) but the same tactics will work no matter which one you have to deal with.



来源:https://stackoverflow.com/questions/3246739/how-to-not-transform-special-characters-to-html-entities-with-owasp-antisamy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!