How to load tar.gz files in streaming datasets?
问题 I would like to do streaming from tar-gzip files (tgz) which include my actual CSV stored data. I already managed to do structured streaming with spark 2.2 when my data comes in as CSV files, but actually, the data comes in as gzipped csv files. Is there a way that the trigger done by structured streaming does an decompress before handling the CSV stream? The code I use to process the files is this: val schema = Encoders.product[RawData].schema val trackerData = spark .readStream .option(