Say I have a 1TB parquet file stored in S3. It\'s written out in individual files of size 1GB.
I use spark SQL to filter the file down based on one of the columns and the