How to get the schema definition from a dataframe in PySpark?

前端 未结 4 2057
萌比男神i
萌比男神i 2021-02-12 14:25

In PySpark it you can define a schema and read data sources with this pre-defined schema, e. g.:

Schema = StructType([ Str         


        
4条回答
  •  Happy的楠姐
    2021-02-12 14:37

    You could re-use schema for existing Dataframe

    l = [('Ankita',25,'F'),('Jalfaizy',22,'M'),('saurabh',20,'M'),('Bala',26,None)]
    people_rdd=spark.sparkContext.parallelize(l)
    schemaPeople = people_rdd.toDF(['name','age','gender'])
    
    schemaPeople.show()
    
    +--------+---+------+
    |    name|age|gender|
    +--------+---+------+
    |  Ankita| 25|     F|
    |Jalfaizy| 22|     M|
    | saurabh| 20|     M|
    |    Bala| 26|  null|
    +--------+---+------+
    
    spark.createDataFrame(people_rdd,schemaPeople.schema).show()
    
    +--------+---+------+
    |    name|age|gender|
    +--------+---+------+
    |  Ankita| 25|     F|
    |Jalfaizy| 22|     M|
    | saurabh| 20|     M|
    |    Bala| 26|  null|
    +--------+---+------+
    

    Just use df.schema to get the underlying schema of dataframe

    schemaPeople.schema
    
    StructType(List(StructField(name,StringType,true),StructField(age,LongType,true),StructField(gender,StringType,true)))
    

提交回复
热议问题