Is there a way to limit the number of records fetched from the jdbc source using spark sql 2.2.0?
I am dealing with a task of moving (and transforming) a large number of
I have not tested this, but you should try using limit
instead of take
. take
calls head
under the covers which has the following note:
this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.
whereas limit
results in a LIMIT pushed into the sql query as it is a lazy evaluation:
The difference between this function and
head
is thathead
is an action and returns an array (by triggering query execution) whilelimit
returns a new Dataset.
If you want the data without pulling it in first then you could even do something like:
...load.limit(limitNum).take(limitNum)