Xerces DOM parser incredibly slow?

后端 未结 2 1673
暗喜
暗喜 2021-01-14 17:16

Currently, I am trying to clean up an HTML file using JTidy, convert it to XHTML and provide the results to a DOM parser. The following code is the result of these efforts:<

相关标签:
2条回答
  • 2021-01-14 17:47

    HTML dtd's are huge, using includes. They take forever. Use an XML catalog. There one can store the dtds locally and map them by their system ID.

    If you use a tool, like maven, you will find sufficient pointers.

    The advantage i.o. intercepting entities as the accepted answer suggests, is that you receive the correct characters.

    0 讨论(0)
  • 2021-01-14 17:52

    Even when not validating, a XML parser needs to fetch the DTD, for example to support named character entities. You should look into implementing an EntityResolver that resolves the request for the DTD to a local copy.

    0 讨论(0)
提交回复
热议问题