How to delete columns in pyspark dataframe

前端 未结 8 1543
滥情空心
滥情空心 2021-01-30 01:55
>>> a
DataFrame[id: bigint, julian_date: string, user_id: bigint]
>>> b
DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigi         


        
8条回答
  •  失恋的感觉
    2021-01-30 02:40

    You could either explicitly name the columns you want to keep, like so:

    keep = [a.id, a.julian_date, a.user_id, b.quan_created_money, b.quan_created_cnt]
    

    Or in a more general approach you'd include all columns except for a specific one via a list comprehension. For example like this (excluding the id column from b):

    keep = [a[c] for c in a.columns] + [b[c] for c in b.columns if c != 'id']
    

    Finally you make a selection on your join result:

    d = a.join(b, a.id==b.id, 'outer').select(*keep)
    

提交回复
热议问题