How does spark perform filters and aggregations on datasets which don't fit in memory

后端未结

关注

 0  546

Say I have a 1TB parquet file stored in S3. It\'s written out in individual files of size 1GB.

I use spark SQL to filter the file down based on one of the columns and the