How to delete columns in pyspark dataframe

前端 未结 8 1552
滥情空心
滥情空心 2021-01-30 01:55
>>> a
DataFrame[id: bigint, julian_date: string, user_id: bigint]
>>> b
DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigi         


        
8条回答
  •  爱一瞬间的悲伤
    2021-01-30 02:37

    You can use two way:

    1: You just keep the necessary columns:

    drop_column_list = ["drop_column"]
    df = df.select([column for column in df.columns if column not in drop_column_list])  
    

    2: This is the more elegant way.

    df = df.drop("col_name")
    

    You should avoid the collect() version, because it will send to the master the complete dataset, it will take a big computing effort!

提交回复
热议问题