I wrote the following code in both Scala & Python, however the DataFrame that is returned doesn\'t appear to apply the non-nullable fields in my schema that I am applyin
In general Spark Datasets
either inherit nullable
property from its parents, or infer based on the external data types.
You can argue if it is a good approach or not but ultimately it is sensible. If semantics of a data source doesn't support nullability constraints, then application of a schema cannot either. At the end of the day it is always better to assume that things can be null
, than fail on the runtime if this the opposite assumption turns out to be incorrect.