programming with pyspark on a Spark cluster, the data is large and in pieces so can not be loaded into the memory or check the sanity of the data easily
basically it loo
With apache 2.0 you can let spark infer the schema of your data. Overall you'll need to cast in your parser function as argued above:
"When schema is None, it will try to infer the schema (column names and types) from data, which should be an RDD of Row, or namedtuple, or dict."