I want to read a billions-of-rows csv file while also inferring the schema:
df = spark.read.csv(\'s3://bucket/data/*\', inferSchema=True, samplingRatio=0.0001