Is Scala/Java not respecting w3 “excess dtd traffic” specs?

后端 未结 9 1195
青春惊慌失措
青春惊慌失措 2021-02-01 07:40

I\'m new to Scala, so I may be off base on this, I want to know if the problem is my code. Given the Scala file httpparse, simplified to:

object Http {
   import         


        
9条回答
  •  猫巷女王i
    2021-02-01 08:27

    There are two problems with what you are trying to do:

    • Scala's xml parser is trying to physically retrieve the DTD when it shouldn't. J-16 SDiZ seems to have some advice for this problem.
    • The Stack overflow page you are trying to parse isn't XML. It's Html4 strict.

    The second problem isn't really possible to fix in your scala code. Even once you get around the dtd problem, you'll find that the source just isn't valid XML (empty tags aren't closed properly, for example).

    You have to either parse the page with something besides an XML parser, or investigate using a utility like tidy to convert the html to xml.

提交回复
热议问题