spark, scala & jdbc - how to limit number of records

前端 未结 3 1705
春和景丽
春和景丽 2021-01-27 04:42

Is there a way to limit the number of records fetched from the jdbc source using spark sql 2.2.0?

I am dealing with a task of moving (and transforming) a large number of

3条回答
  •  鱼传尺愫
    2021-01-27 05:11

    I have not tested this, but you should try using limit instead of take. take calls head under the covers which has the following note:

    this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

    whereas limit results in a LIMIT pushed into the sql query as it is a lazy evaluation:

    The difference between this function and head is that head is an action and returns an array (by triggering query execution) while limit returns a new Dataset.

    If you want the data without pulling it in first then you could even do something like:

    ...load.limit(limitNum).take(limitNum)
    

提交回复
热议问题