How to perform Unit testing on Spark Structured Streaming?

前端 未结 1 400
别那么骄傲
别那么骄傲 2021-01-24 21:53

I would like to know about the unit testing side of Spark Structured Streaming. My scenario is, I am getting data from Kafka and I am consuming it using Spark Structured Streami

相关标签:
1条回答
  • 2021-01-24 22:01

    tl;dr Use MemoryStream to add events and memory sink for the output.

    The following code should help to get started:

    import org.apache.spark.sql.execution.streaming.MemoryStream
    implicit val sqlCtx = spark.sqlContext
    import spark.implicits._
    val events = MemoryStream[Event]
    val sessions = events.toDS
    assert(sessions.isStreaming, "sessions must be a streaming Dataset")
    
    // use sessions event stream to apply required transformations
    val transformedSessions = ...
    
    val streamingQuery = transformedSessions
      .writeStream
      .format("memory")
      .queryName(queryName)
      .option("checkpointLocation", checkpointLocation)
      .outputMode(queryOutputMode)
      .start
    
    // Add events to MemoryStream as if they came from Kafka
    val batch = Seq(
      eventGen.generate(userId = 1, offset = 1.second),
      eventGen.generate(userId = 2, offset = 2.seconds))
    val currentOffset = events.addData(batch)
    streamingQuery.processAllAvailable()
    events.commit(currentOffset.asInstanceOf[LongOffset])
    
    // check the output
    // The output is in queryName table
    // The following code simply shows the result
    spark
      .table(queryName)
      .show(truncate = false)
    
    0 讨论(0)
提交回复
热议问题