How to Read a parquet file , change datatype and write to another Parquet file in Hadoop using pyspark
问题 My source parquet file has everything as string. My destination parquet file needs to convert this to different datatype like int, string, date etc. How do I do this? 回答1: you may wanted to apply userdefined schema to speedup data loading. There are 2 ways to apply that- using the input DDL-formatted string spark.read.schema("a INT, b STRING, c DOUBLE").parquet("test.parquet") Use StructType schema customSchema = StructType([ StructField("a", IntegerType(), True), StructField("b", StringType(