The following code executes fine in a scala shell given snakeyaml version 1.17
import org.yaml.snakeyaml.Yaml
import org.yaml.snakeyaml.constructor.Constructor
import scala.collection.mutable.ListBuffer
import scala.beans.BeanProperty
class EmailAccount {
@scala.beans.BeanProperty var accountName: String = null
override def toString: String = {
return s"acct ($accountName)"
}
}
val text = """accountName: Ymail Account"""
val yaml = new Yaml(new Constructor(classOf[EmailAccount]))
val e = yaml.load(text).asInstanceOf[EmailAccount]
println(e)
However when running in spark (2.0.0 in this case) the resulting error is:
org.yaml.snakeyaml.constructor.ConstructorException: Can't construct a java object for tag:yaml.org,2002:EmailAccount; exception=java.lang.NoSuchMethodException: EmailAccount.<init>()
in 'string', line 1, column 1:
accountName: Ymail Account
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:350)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
at org.yaml.snakeyaml.Yaml.load(Yaml.java:369)
... 48 elided
Caused by: org.yaml.snakeyaml.error.YAMLException: java.lang.NoSuchMethodException: EmailAccount.<init>()
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.createEmptyJavaBean(Constructor.java:220)
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:190)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:346)
... 53 more
Caused by: java.lang.NoSuchMethodException: EmailAccount.<init>()
at java.lang.Class.getConstructor0(Class.java:2810)
at java.lang.Class.getDeclaredConstructor(Class.java:2053)
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.createEmptyJavaBean(Constructor.java:216)
... 55 more
I launched the scala shell with
scala -classpath "/home/placey/snakeyaml-1.17.jar"
I launched the spark shell with
/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-shell --master local --jars /home/placey/snakeyaml-1.17.jar
Solution
Create a self-contained application and run it using spark-submit
instead of using spark-shell
.
I've created a minimal project for you as a gist here. All you need to do is put both files (build.sbt
and Main.scala
) in some directory, then run:
sbt package
in order to create a JAR. The JAR will be in target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar
or a similar location. You can get SBT from here if you haven't used it yet. Finally, you can run the project:
/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-submit --class "Main" --master local --jars /home/placey/snakeyaml-1.17.jar target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar
The output should be:
[many lines of Spark's log)]
acct (Ymail Account)
[more lines of Spark's log)]
Explanation
Spark's shell (REPL) transforms all classes you define in it by adding $iw
parameter to your constructors. I've explained it here. SnakeYAML expects a zero-parameter constructor for JavaBean-like classes, but there isn't one, so it fails.
You can try this yourself:
scala> class Foo() {}
defined class Foo
scala> classOf[Foo].getConstructors()
res0: Array[java.lang.reflect.Constructor[_]] = Array(public Foo($iw))
scala> classOf[Foo].getConstructors()(0).getParameterCount
res1: Int = 1
As you can see, Spark transforms the constructor by adding a parameter of type $iw
.
Alternative solutions
Define your own Constructor
If you really need to get it working in the shell, you could define your own class implementing org.yaml.snakeyaml.constructor.BaseConstructor
and make sure that $iw
gets passed to constructors, but this is a lot of work (I actually wrote my own Constructor
in Scala for security reasons some time ago, so I have some experience with this).
You could also define a custom Constructor
hard-coded to instantiate a specific class (EmailAccount
in your case) similar to the DiceConstructor
shown in SnakeYAML's documentation. This is much easier, but requires writing code for each class you want to support.
Example:
case class EmailAccount(accountName: String)
class EmailAccountConstructor extends org.yaml.snakeyaml.constructor.Constructor {
val emailAccountTag = new org.yaml.snakeyaml.nodes.Tag("!emailAccount")
this.rootTag = emailAccountTag
this.yamlConstructors.put(emailAccountTag, new ConstructEmailAccount)
private class ConstructEmailAccount extends org.yaml.snakeyaml.constructor.AbstractConstruct {
def construct(node: org.yaml.snakeyaml.nodes.Node): Object = {
// TODO: This is fine for quick prototyping in a REPL, but in a real
// application you should probably add type checks.
val mnode = node.asInstanceOf[org.yaml.snakeyaml.nodes.MappingNode]
val mapping = constructMapping(mnode)
val name = mapping.get("accountName").asInstanceOf[String]
new EmailAccount(name)
}
}
}
You can save this as a file and load it in the REPL using :load filename.scala
.
Bonus advantage of this solution is that it can create immutable case class instances directly. Unfortunately Scala REPL seems to have issues with imports, so I've used fully qualified names.
Don't use JavaBeans
You can also just parse YAML documents as simple Java maps:
scala> val yaml2 = new Yaml()
yaml2: org.yaml.snakeyaml.Yaml = Yaml:1141996301
scala> val e2 = yaml2.load(text)
e2: Object = {accountName=Ymail Account}
scala> val map = e2.asInstanceOf[java.util.Map[String, Any]]
map: java.util.Map[String,Any] = {accountName=Ymail Account}
scala> map.get("accountName")
res4: Any = Ymail Account
This way SnakeYAML won't need to use reflection.
However, since you're using Scala, I recommend trying
MoultingYAML, which is a Scala wrapper for SnakeYAML. It parses YAML documents to simple Java types and then maps them to Scala types (even your own types like EmailAccount
).
来源:https://stackoverflow.com/questions/38002883/snakeyaml-and-spark-results-in-an-inability-to-construct-objects