Scala: Parse HTML-fragment

Deadly 提交于 2019-12-11 12:13:34


Our database stores HTML fragments like f.ex. <p>A.</p><p>B.</p>. I want to include the Html fragements from the database into a Lift snippet.

To do that, I tried to use the XML.loadString()-method to convert the fragement into a scala.xml.Elem, but this only works for full valid XML-documents:

import scala.xml.XML
def doesnotWork() {
  val result = XML.loadString("<p>A</p><p>B</p>")
  assert(result === <p>A</p><p>B</p>)

def thisWorks() {
  val result = XML.loadString("<test><p>A</p><p>B</p></test>")
  assert(result === <test><p>A</p><p>B</p></test>)

The test doesnotWork results in an exception:

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10; The markup in the document following the root element must be well-formed.

Is it possible to convert just (valid) fragements to XML?


Since you're using Lift, you can wrap your XML in lift:children as a workaround. The Children snippet simply returns the element's children; and is very useful for wrapping fragments you need to parse.

def thisAlsoWorks() {
  val result = XML.loadString("<lift:children><p>A</p><p>B</p></lift:children>")
  assert(result === <lift:children><p>A</p><p>B</p></lift:children>)


You don't need a full valid XML document, but you do need a single top-level tag.

As you observed, the following works:


You could then either store a sequence of Elems, or wrap them in a custom tag and extract the sequence using .descendant.

