Mocking SparkSession for unit testing

余生颓废 提交于 2021-02-07 08:14:03

问题


I have a method in my spark application that loads the data from a MySQL database. the method looks something like this.

trait DataManager {

val session: SparkSession

def loadFromDatabase(input: Input): DataFrame = {
            session.read.jdbc(input.jdbcUrl, s"(${input.selectQuery}) T0",
              input.columnName, 0L, input.maxId, input.parallelism, input.connectionProperties)
    }
}

The method does nothing else other than executing jdbc method and loads data from the database. How can I test this method? The standard approach is to create a mock of the object session which is an instance of SparkSession. But since SparkSession has a private constructor I was not able to mock it using ScalaMock.

The main ask here is that my function is a pure side-effecting function (the side-effect being pull data from relational database) and how can i unit test this function given that I have issues mocking SparkSession.

So is there any way I can mock SparkSession or any other better way than mocking to test this method?


回答1:


In your case I would recommend not to mock the SparkSession. This would more or less mock the entire function (which you could do anyways). If you want to test this function my suggestion would be to run an embeded database (like H2) and use a real SparkSession. To do this you need to provide the SparkSession to your DataManager.

Untested sketch:

Your code:

class DataManager (session: SparkSession) {
         def loadFromDatabase(input: Input): DataFrame = {
            session.read.jdbc(input.jdbcUrl, s"(${input.selectQuery}) T0",
            input.columnName, 0L, input.maxId, input.parallelism, input.connectionProperties)
         }
    }

Your test-case:

class DataManagerTest extends FunSuite with BeforeAndAfter {
  override def beforeAll() {
    Connection conn = DriverManager.getConnection("jdbc:h2:~/test", "sa", "");
    // your insert statements goes here
    conn.close()
  }

  test ("should load data from database") {
    val dm = DataManager(SparkSession.builder().getOrCreate())
    val input = Input(jdbcUrl = "jdbc:h2:~/test", selectQuery="SELECT whateveryounedd FROM whereeveryouputit ")
    val expectedData = dm.loadFromDatabase(input)
    assert(//expectedData)
  }
}



回答2:


You can use mockito scala to mock SparkSession as shown in this article.



来源:https://stackoverflow.com/questions/49483987/mocking-sparksession-for-unit-testing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!