Pyspark : select specific column with its position

前端 未结 2 724
时光取名叫无心
时光取名叫无心 2021-01-18 08:42

I would like to know how to select a specific column with its number but not with its name in a dataframe ?

Like this in Pandas:

df = df.iloc[:,2]
         


        
2条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-18 09:18

    You can always get the name of the column with df.columns[n] and then select it:

    df = spark.createDataFrame([[1,2], [3,4]], ['a', 'b'])
    

    To select column at position n:

    n = 1
    df.select(df.columns[n]).show()
    +---+                                                                           
    |  b|
    +---+
    |  2|
    |  4|
    +---+
    

    To select all but column n:

    n = 1
    

    You can either use drop:

    df.drop(df.columns[n]).show()
    +---+
    |  a|
    +---+
    |  1|
    |  3|
    +---+
    

    Or select with manually constructed column names:

    df.select(df.columns[:n] + df.columns[n+1:]).show()
    +---+
    |  a|
    +---+
    |  1|
    |  3|
    +---+
    

提交回复
热议问题