Importing spark.implicits._ in scala

前端未结

关注

 8  1133

I am trying to import spark.implicits._ Apparently, this is an object inside a class in scala. when i import it in a method like so:

def f() = {
  val spark


                      
              相关标签:


      
      
        
          8条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-12-25 10:10
              
            
            
                                                                       
I know this is old post but just would like to share my pointers on this I think the issue with the way you are declaring the sparkSession .When you declare sparkSession as var that does not make it immutable which can change later point of time .So it doesn't allow importing the implicits on that as it might lead to ambiguity as later stage it  can be  changed where as it's not same in case of val
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2020-12-25 10:12
              
            
            
                                                                       
Thanks to @bluenote10 for helpful answer and we can simplify it again, for example  without helper object testImplicits:

private object testImplicits extends SQLImplicits {
  protected override def _sqlContext: SQLContext = self.spark.sqlContext
}


with following way:

trait SharedSparkSession extends BeforeAndAfterAll { self: Suite =>

  /**
   * The SparkSession instance to use for all tests in one suite.
   */
  private var spark: SparkSession = _

  /**
   * Returns local running SparkSession instance.
   * @return SparkSession instance `spark`
   */
  protected def sparkSession: SparkSession = spark

  /**
   * A helper implicit value that allows us to import SQL implicits.
   */
  protected lazy val sqlImplicits: SQLImplicits = self.sparkSession.implicits

  /**
   * Starts a new local spark session for tests.
   */
  protected def startSparkSession(): Unit = {
    if (spark == null) {
      spark = SparkSession
        .builder()
        .master("local[2]")
        .appName("Testing Spark Session")
        .getOrCreate()
    }
  }

  /**
   * Stops existing local spark session.
   */
  protected def stopSparkSession(): Unit = {
    if (spark != null) {
      spark.stop()
      spark = null
    }
  }

  /**
   * Runs before all tests and starts spark session.
   */
  override def beforeAll(): Unit = {
    startSparkSession()
    super.beforeAll()
  }

  /**
   * Runs after all tests and stops existing spark session.
   */
  override def afterAll(): Unit = {
    super.afterAll()
    stopSparkSession()
  }
}


and finally we can use SharedSparkSession for unit tests and import sqlImplicits:

class SomeSuite extends FunSuite with SharedSparkSession {
  // We can import sql implicits 
  import sqlImplicits._

  // We can use method sparkSession which returns locally running spark session
  test("some test") {
    val df = sparkSession.sparkContext.parallelize(List(1,2,3)).toDF()
    //...
  }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2020-12-25 10:19
              
            
            
                                                                       
I just instantiate SparkSession and before to use, "import implicits".
@transient lazy val spark = SparkSession
  .builder()
  .master("spark://master:7777")
  .getOrCreate()

import spark.implicits._

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2020-12-25 10:22
              
            
            
                                                                       
Well, I've been re-using existing SparkSession in each called method.. by creating local val inside method -

val spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession.active


And then 

import spark.implicits._

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野性不改        
                
              
                            
                2020-12-25 10:25
              
            
            
                                                                       
You can do something similar to what is done in the Spark testing suites. For example this would work (inspired by SQLTestData):

class SomeSpec extends FlatSpec with BeforeAndAfter { self =>

  var spark: SparkSession = _

  private object testImplicits extends SQLImplicits {
    protected override def _sqlContext: SQLContext = self.spark.sqlContext
  }
  import testImplicits._

  before {
    spark = SparkSession.builder().master("local").getOrCreate()
  }

  "a test" should "run" in {
    // implicits are working
    val df = spark.sparkContext.parallelize(List(1,2,3)).toDF()
  }
}


Alternatively you may use something like SharedSQLContext directly, which provides a testImplicits: SQLImplicits, i.e.:

class SomeSpec extends FlatSpec with SharedSQLContext {
  import testImplicits._

  // ...

}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  佛祖请我去吃肉        
                
              
                            
                2020-12-25 10:28
              
            
            
                                                                       
I think the GitHub code in SparkSession.scala file can give you a good hint:

      /**
       * :: Experimental ::
       * (Scala-specific) Implicit methods available in Scala for converting
       * common Scala objects into [[DataFrame]]s.
       *
       * {{{
       *   val sparkSession = SparkSession.builder.getOrCreate()
       *   import sparkSession.implicits._
       * }}}
       *
       * @since 2.0.0
       */
      @Experimental
      object implicits extends SQLImplicits with Serializable {
        protected override def _sqlContext: SQLContext = SparkSession.this.sqlContext
      }


here "spark" in "spark.implicits._" is just the sparkSession object we created.

Here is another reference!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复