Select columns in Pyspark Dataframe

前端 未结 6 1570
小鲜肉
小鲜肉 2021-02-03 21:45

I am looking for a way to select columns of my dataframe in pyspark. For the first row, I know I can use df.first() but not sure about columns given that they do

6条回答
  •  醉话见心
    2021-02-03 22:43

    The method select accepts a list of column names (string) or expressions (Column) as a parameter. To select columns you can use:

    -- column names (strings):

    df.select('col_1','col_2','col_3')
    

    -- column objects:

    import pyspark.sql.functions as F
    
    df.select(F.col('col_1'), F.col('col_2'), F.col('col_3'))
    
    # or
    
    df.select(df.col_1, df.col_2, df.col_3)
    
    # or
    
    df.select(df['col_1'], df['col_2'], df['col_3'])
    

    -- a list of column names or column objects:

    df.select(*['col_1','col_2','col_3'])
    
    #or
    
    df.select(*[F.col('col_1'), F.col('col_2'), F.col('col_3')])
    
    #or 
    
    df.select(*[df.col_1, df.col_2, df.col_3])
    

    The star operator * can be omitted as it's used to keep it consistent with other functions like drop that don't accept a list as a parameter.

提交回复
热议问题