Select columns in Pyspark Dataframe

前端未结

关注

 6  1570

小鲜肉 2021-02-03 21:45

I am looking for a way to select columns of my dataframe in pyspark. For the first row, I know I can use df.first() but not sure about columns given that they do

6条回答

醉话见心 (楼主)

2021-02-03 22:43

The method select accepts a list of column names (string) or expressions (Column) as a parameter. To select columns you can use:

-- column names (strings):

df.select('col_1','col_2','col_3')

-- column objects:

import pyspark.sql.functions as F df.select(F.col('col_1'), F.col('col_2'), F.col('col_3')) # or df.select(df.col_1, df.col_2, df.col_3) # or df.select(df['col_1'], df['col_2'], df['col_3'])

-- a list of column names or column objects:

df.select(*['col_1','col_2','col_3']) #or df.select(*[F.col('col_1'), F.col('col_2'), F.col('col_3')]) #or df.select(*[df.col_1, df.col_2, df.col_3])

The star operator * can be omitted as it's used to keep it consistent with other functions like drop that don't accept a list as a parameter.

0 讨论(0)

查看其它6个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复