How to write stream to S3 with year, month and day of the day when records were received?

后端 未结 1 836
梦谈多话
梦谈多话 2021-01-14 06:07

I have a simple streams that reads some data from a Kafka topic:

 val ds = spark
      .readStream
      .format(\"kafka\")
      .option(\"kafka.bootstrap.s         


        
1条回答
  •  囚心锁ツ
    2021-01-14 06:43

    Use partitionBy clause:

    import org.apache.spark.sql.functions._
    
    df.select(
        dayofmonth(current_date()) as "day",
        month(current_date()) as "month",
        year(current_date()) as "year",
        $"*")
      .writeStream
      .partitionBy("year", "month", "day")
      ... // all other options
    

    0 讨论(0)
提交回复
热议问题