I am new to spark and was playing around with Pyspark.sql. According to the pyspark.sql documentation here, one can go about setting the Spark dataframe and schema like this:
It means if the column allows null values, true
for nullable, and false
for not nullable
StructField(name, dataType, nullable): Represents a field in a StructType. The name of a field is indicated by name. The data type of a field is indicated by dataType. nullable is used to indicate if values of this fields can have null values.
Refer to Spark SQL and DataFrame Guide for more informations.
You can also use a datatype string:
schema = 'Name STRING, DateTime TIMESTAMP, Age INTEGER'
There's not much documentation on datatype strings, but they mention them in the docs. They're much more compact and readable than StructTypes