Proper usage of JTidy to purify HTML

前端 未结 3 1396
梦毁少年i
梦毁少年i 2021-01-13 03:18

I am trying to use JTidy (jtidy-r938.jar) to sanitize an input HTML string, but I seem to have problems getting the default settings right. Often strings such as \"hello wor

相关标签:
3条回答
  • 2021-01-13 04:04

    Have a look at how JTidy is configured:

    StringWriter writer = new StringWriter();
    tidy.getConfiguration().printConfigOptions(writer, true);
    System.out.println(writer.toString());
    

    Maybe it then get clear what causes the problem.

    What is weird? Little example, of actual output and expected... maybe ?

    0 讨论(0)
  • 2021-01-13 04:04

    Here is how we are calling JTidy from Ant. You may infer the API call from it:

    <tidy destdir="${build.dir.result}">
      <fileset dir="${src}" includes="**/*.htm"/>
      <parameter name="tidy-mark" value="false"/>
      <parameter name="output-xml" value="no"/>
      <parameter name="numeric-entities" value="yes"/>
      <parameter name="indent-spaces" value="2"/>
      <parameter name="indent-attributes" value="no"/>
      <parameter name="markup" value="yes"/>
      <parameter name="wrap" value="2000"/>
      <parameter name="uppercase-tags" value="no"/>
      <parameter name="uppercase-attributes" value="no"/>
      <parameter name="quiet" value="no"/>
      <parameter name="clean" value="yes"/>
      <parameter name="show-warnings" value="yes"/>
      <parameter name="break-before-br" value="yes"/>
      <parameter name="hide-comments" value="yes"/>
      <parameter name="char-encoding" value="latin1"/>
      <parameter name="output-html" value="yes"/>
    </tidy>
    
    0 讨论(0)
  • 2021-01-13 04:13

    Well, this seems to be a bug in Jtidy. For the exact file which causes problems, refer here:

    http://sourceforge.net/tracker/?func=detail&aid=2985849&group_id=13153&atid=113153

    Thanks for all the help folks!

    0 讨论(0)
提交回复
热议问题