Reading multiple files from S3 in Spark by date period

前端 未结 1 384
后悔当初
后悔当初 2020-12-15 04:16

Description

I have an application, which sends data to AWS Kinesis Firehose and this writes the data into my S3 bucket. Firehose uses \"yyyy/MM/dd/HH\" format to w

相关标签:
1条回答
  • 2020-12-15 04:43

    There is a much simpler solution. If you look at the DataFrameReader API you'll notice that there is a .json(paths: String*) method. Just build a collection of the paths you want, with globs of not, as you prefer, and then call the method, e.g.,

    val paths: Seq[String] = ...
    val df = sqlContext.read.json(paths: _*)
    
    0 讨论(0)
提交回复
热议问题