snakeyaml and spark results in an inability to construct objects

喜夏-厌秋 提交于 2019-12-04 03:58:43
Paweł Bartkiewicz

Solution

Create a self-contained application and run it using spark-submit instead of using spark-shell.

I've created a minimal project for you as a gist here. All you need to do is put both files (build.sbt and Main.scala) in some directory, then run:

sbt package

in order to create a JAR. The JAR will be in target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar or a similar location. You can get SBT from here if you haven't used it yet. Finally, you can run the project:

/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-submit --class "Main" --master local --jars /home/placey/snakeyaml-1.17.jar target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar

The output should be:

[many lines of Spark's log)]
acct (Ymail Account)
[more lines of Spark's log)]

Explanation

Spark's shell (REPL) transforms all classes you define in it by adding $iw parameter to your constructors. I've explained it here. SnakeYAML expects a zero-parameter constructor for JavaBean-like classes, but there isn't one, so it fails.

You can try this yourself:

scala> class Foo() {}
defined class Foo

scala> classOf[Foo].getConstructors()
res0: Array[java.lang.reflect.Constructor[_]] = Array(public Foo($iw))

scala> classOf[Foo].getConstructors()(0).getParameterCount
res1: Int = 1

As you can see, Spark transforms the constructor by adding a parameter of type $iw.

Alternative solutions

Define your own Constructor

If you really need to get it working in the shell, you could define your own class implementing org.yaml.snakeyaml.constructor.BaseConstructor and make sure that $iw gets passed to constructors, but this is a lot of work (I actually wrote my own Constructor in Scala for security reasons some time ago, so I have some experience with this).

You could also define a custom Constructor hard-coded to instantiate a specific class (EmailAccount in your case) similar to the DiceConstructor shown in SnakeYAML's documentation. This is much easier, but requires writing code for each class you want to support.

Example:

case class EmailAccount(accountName: String)

class EmailAccountConstructor extends org.yaml.snakeyaml.constructor.Constructor {

  val emailAccountTag = new org.yaml.snakeyaml.nodes.Tag("!emailAccount")
  this.rootTag = emailAccountTag
  this.yamlConstructors.put(emailAccountTag, new ConstructEmailAccount)

  private class ConstructEmailAccount extends org.yaml.snakeyaml.constructor.AbstractConstruct {
    def construct(node: org.yaml.snakeyaml.nodes.Node): Object = {
      // TODO: This is fine for quick prototyping in a REPL, but in a real
      //       application you should probably add type checks.
      val mnode = node.asInstanceOf[org.yaml.snakeyaml.nodes.MappingNode]
      val mapping = constructMapping(mnode)
      val name = mapping.get("accountName").asInstanceOf[String]
      new EmailAccount(name)
    }
  }

}

You can save this as a file and load it in the REPL using :load filename.scala.

Bonus advantage of this solution is that it can create immutable case class instances directly. Unfortunately Scala REPL seems to have issues with imports, so I've used fully qualified names.

Don't use JavaBeans

You can also just parse YAML documents as simple Java maps:

scala> val yaml2 = new Yaml()
yaml2: org.yaml.snakeyaml.Yaml = Yaml:1141996301

scala> val e2 = yaml2.load(text)
e2: Object = {accountName=Ymail Account}

scala> val map = e2.asInstanceOf[java.util.Map[String, Any]]
map: java.util.Map[String,Any] = {accountName=Ymail Account}

scala> map.get("accountName")
res4: Any = Ymail Account

This way SnakeYAML won't need to use reflection.

However, since you're using Scala, I recommend trying MoultingYAML, which is a Scala wrapper for SnakeYAML. It parses YAML documents to simple Java types and then maps them to Scala types (even your own types like EmailAccount).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!