pyspark select subset of files using regex/glob from s3
问题 I have a number files each segregated by date (date=yyyymmdd) on amazon s3. The files go back 6 months but I would like to restrict my script to only use the last 3 months of data. I am unsure as to whether I will be able to use regular expressions to do something like sc.textFile("s3://path_to_dir/yyyy[m1,m2,m3]*") where m1,m2,m3 represents the 3 months from the current date that I would like to use. One discussion also suggested using something like sc.textFile("s3://path_to_dir/yyyym1*",