>>> a
DataFrame[id: bigint, julian_date: string, user_id: bigint]
>>> b
DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigi
You can use two way:
1: You just keep the necessary columns:
drop_column_list = ["drop_column"]
df = df.select([column for column in df.columns if column not in drop_column_list])
2: This is the more elegant way.
df = df.drop("col_name")
You should avoid the collect() version, because it will send to the master the complete dataset, it will take a big computing effort!