>>> a
DataFrame[id: bigint, julian_date: string, user_id: bigint]
>>> b
DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigi
You could either explicitly name the columns you want to keep, like so:
keep = [a.id, a.julian_date, a.user_id, b.quan_created_money, b.quan_created_cnt]
Or in a more general approach you'd include all columns except for a specific one via a list comprehension. For example like this (excluding the id
column from b
):
keep = [a[c] for c in a.columns] + [b[c] for c in b.columns if c != 'id']
Finally you make a selection on your join result:
d = a.join(b, a.id==b.id, 'outer').select(*keep)