Workaround for importing spark implicits everywhere

ε祈祈猫儿з 提交于 2019-12-05 12:52:21

Something that would help a bit would be to do the import inside the class or object instead of each function. For your "File A" and "File B" examples:

File A
class A {
    val spark = SparkSession.builder.getOrCreate()
    import spark.implicits._

    def job() = {
        //create dataset ds
        val b = new B(spark)
        b.doSomething(ds)
        doSomething(ds)
    }

    private def doSomething(ds: Dataset[Foo]) = {
        ds.map(e => 1)            
    }
}

File B
class B(spark: SparkSession) {
    import spark.implicits._

    def doSomething(ds: Dataset[Foo]) = {    
        ds.map(e => "SomeString")
    }
}

In this way, you get a manageable amount of imports.

Unfortunately, to my knowledge there is no other way to reduce the number of imports even more. This is due to the need to the SparkSession object when doing the actual import. Hence, this is the best that can be done.


Update:

An even more convinient method is to create a Scala Trait and combine it with an empty Object. This allows for easy import of implicits at the top of each file while allowing extending the trait to use the SparkSession object.

Example:

trait SparkJob {
  val spark: SparkSession = SparkSession.builder.
    .master(...)
    .config(..., ....) // Any settings to be applied
    .getOrCreate()
}

object SparkJob extends SparkJob {}

With this we can do the following for File A and B:

File A:

import SparkJob.spark.implicits._
class A extends SparkJob {
  spark.sql(...) // Allows for usage of the SparkSession inside the class
  ...
}

File B:

import SparkJob.spark.implicits._
class B extends SparkJob {
  ...    
}

Note that it's only necessary to extend SparkJob for for the classes or objects that use the spark object itself.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!