I am trying to use JTidy (jtidy-r938.jar) to sanitize an input HTML string, but I seem to have problems getting the default settings right. Often strings such as \"hello wor
Have a look at how JTidy is configured:
StringWriter writer = new StringWriter();
tidy.getConfiguration().printConfigOptions(writer, true);
System.out.println(writer.toString());
Maybe it then get clear what causes the problem.
What is weird? Little example, of actual output and expected... maybe ?
Here is how we are calling JTidy from Ant. You may infer the API call from it:
<tidy destdir="${build.dir.result}">
<fileset dir="${src}" includes="**/*.htm"/>
<parameter name="tidy-mark" value="false"/>
<parameter name="output-xml" value="no"/>
<parameter name="numeric-entities" value="yes"/>
<parameter name="indent-spaces" value="2"/>
<parameter name="indent-attributes" value="no"/>
<parameter name="markup" value="yes"/>
<parameter name="wrap" value="2000"/>
<parameter name="uppercase-tags" value="no"/>
<parameter name="uppercase-attributes" value="no"/>
<parameter name="quiet" value="no"/>
<parameter name="clean" value="yes"/>
<parameter name="show-warnings" value="yes"/>
<parameter name="break-before-br" value="yes"/>
<parameter name="hide-comments" value="yes"/>
<parameter name="char-encoding" value="latin1"/>
<parameter name="output-html" value="yes"/>
</tidy>
Well, this seems to be a bug in Jtidy. For the exact file which causes problems, refer here:
http://sourceforge.net/tracker/?func=detail&aid=2985849&group_id=13153&atid=113153
Thanks for all the help folks!