Reading multiple files from S3 in Spark by date period

前端 未结 1 380
后悔当初
后悔当初 2020-12-15 04:16

Description

I have an application, which sends data to AWS Kinesis Firehose and this writes the data into my S3 bucket. Firehose uses \"yyyy/MM/dd/HH\" format to w

1条回答
  •  醉梦人生
    2020-12-15 04:43

    There is a much simpler solution. If you look at the DataFrameReader API you'll notice that there is a .json(paths: String*) method. Just build a collection of the paths you want, with globs of not, as you prefer, and then call the method, e.g.,

    val paths: Seq[String] = ...
    val df = sqlContext.read.json(paths: _*)
    

    0 讨论(0)
提交回复
热议问题