Does one batch interval of data generate one and only one RDD in DStream regardless of how big is the quantity of the data?
In Spark Streaming Programming Guide - Discretized Streams (DStreams), there is:
Each RDD in a DStream contains data from a certain interval
It's very late to reply to this thread. But still, It's worth adding a few more points. Number of RDDs depends upon how many receivers you have in your application. That's why "sparkContext.read" will have multiple RDDs. But if you have only one receiver or Kafka as a source (receiver-less) in that case you will get only one RDD.
Yes, there is exactly one RDD per batch interval, produced at every batch interval independent of number of records (that are included in the RDD -- there could be zero records inside).
If there wasn't, and RDD creation was conditioned on the number of elements, you wouldn't have synchronous (micro-batching) streaming, but rather a form of asynchronous processing.