问题 In PySpark it you can define a schema and read data sources with this pre-defined schema, e. g.: Schema = StructType([ StructField("temperature", DoubleType(), True), StructField("temperature_unit", StringType(), True), StructField("humidity", DoubleType(), True), StructField("humidity_unit", StringType(), True), StructField("pressure", DoubleType(), True), StructField("pressure_unit", StringType(), True) ]) For some datasources it is possible to infer the schema from the data-source and get