Call a function with each element a stream in Databricks

后端 未结 1 1250
情书的邮戳
情书的邮戳 2021-01-23 15:17

I have a DataFrame stream in Databricks, and I want to perform an action on each element. On the net I found specific purpose methods, like writing it to the console or dumping

相关标签:
1条回答
  • 2021-01-23 15:56

    Here is an example of reading using foreachBatch to save every item to redis using the streaming api.

    Related to a previous question (DataFrame to RDD[(String, String)] conversion)

    // import spark and spark-redis
    import org.apache.spark._
    import org.apache.spark.sql._
    import org.apache.spark.streaming._
    import org.apache.spark.sql.types._
    
    import com.redislabs.provider.redis._
    
    // schema of csv files
    val userSchema = new StructType()
        .add("name", "string")
        .add("age", "string")
    
    // create a data stream reader from a dir with csv files
    val csvDF = spark
      .readStream
      .format("csv")
      .option("sep", ";")
      .schema(userSchema)
      .load("./data") // directory where the CSV files are 
    
    // redis
    val redisConfig = new RedisConfig(new RedisEndpoint("localhost", 6379))
    implicit val readWriteConfig: ReadWriteConfig = ReadWriteConfig.Default
    
    csvDF.map(r => (r.getString(0), r.getString(0))) // converts the dataset to a Dataset[(String, String)]
      .writeStream // create a data stream writer
      .foreachBatch((df, _) => sc.toRedisKV(df.rdd)(redisConfig)) // save each batch to redis after converting it to a RDD
      .start // start processing
    
    0 讨论(0)
提交回复
热议问题