Spark Streaming Saving data to MySQL with foreachRDD()
in Scala
Please, can somebody give me a functional example about saving an Spark Streaming to MySQL DB using foreachRDD()
in Scala. I have below code but it's not working. I just need a simple example, not sintaxis or theory.
package examples
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark._
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Seconds, StreamingContext}
import StreamingContext._
import org.apache.hadoop.io.Text
import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.mapred.SequenceFileOutputFormat
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
import java.util.Properties
import org.apache.spark.sql.SaveMode
object StreamingToMysql {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("NetworkWordCount").setMaster("local[*]")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val hiveCtx= new HiveContext(sc)
import hiveCtx.implicits._
val ssc = new StreamingContext(sc, Seconds(1))
val lines = ssc.socketTextStream("localhost", 9999)
val rdd = sc.parallelize(List(1))
val df = rdd.toDF()
val split = lines.map(line => line.split(",") )
val input = split.map(x => x(0))
input.foreachRDD { rdd =>
if (rdd.take (1).size == 1) {
rdd.foreachPartition { iterator =>
iterator.foreach {
val connectionProperties = new Properties()
connectionProperties.put("user", "root")
connectionProperties.put("password", "admin123")
.jdbc("jdbc:mysql://", "topics", connectionProperties)
To write data from Spark Streaming to an external system, you can use the high-level dataframes API or the low-level RDD. In the code above, both approaches are mixed and do work.
Assuming that you know the structure of the incoming data in Spark Streaming, you can create a Dataframe out of each RDD and use the Dataframe API to save it:
First, create a schema for the data:
case class MyStructure(field: Type,....)
then, apply the schema to the incoming stream:
val structuredData = dstream.map(record => MyStructure(record.field1, ... record.fieldn))
Now use the foreachRDD
to transform each RDD in the DStream into a Dataframe and use the DF API to save it:
// JDBC writer configuration
val connectionProperties = new Properties()
connectionProperties.put("user", "root")
connectionProperties.put("password", "*****")
structuredData.foreachRDD { rdd =>
val df = rdd.toDF() // create a dataframe from the schema RDD
.jdbc("jdbc:mysql://", "topics", connectionProperties)