I have a Spark 2.1 job where I maintain multiple Dataset objects/RDD\'s that represent different queries over our underlying Hive/HDFS datastore. I\'ve noticed that if I si
Yes you can use multithreading in the driver code, but normally this does not increase performance, unless your queries operate on very skewed data and/or cannot be parallelized well enough to fully utilize the resources.
You can do something like that:
val datasets : Seq[Dataset[_]] = ???
.par // transform to parallel Seq
.foreach(ds => ds.write.saveAsTable(...)