Pyspark : select specific column with its position

前端 未结 2 735
时光取名叫无心
时光取名叫无心 2021-01-18 08:42

I would like to know how to select a specific column with its number but not with its name in a dataframe ?

Like this in Pandas:

df = df.iloc[:,2]


        
相关标签:
2条回答
  • 2021-01-18 09:18

    You can always get the name of the column with df.columns[n] and then select it:

    df = spark.createDataFrame([[1,2], [3,4]], ['a', 'b'])
    

    To select column at position n:

    n = 1
    df.select(df.columns[n]).show()
    +---+                                                                           
    |  b|
    +---+
    |  2|
    |  4|
    +---+
    

    To select all but column n:

    n = 1
    

    You can either use drop:

    df.drop(df.columns[n]).show()
    +---+
    |  a|
    +---+
    |  1|
    |  3|
    +---+
    

    Or select with manually constructed column names:

    df.select(df.columns[:n] + df.columns[n+1:]).show()
    +---+
    |  a|
    +---+
    |  1|
    |  3|
    +---+
    
    0 讨论(0)
  • 2021-01-18 09:27

    Same solution as mirkhosro:

    For a dataframe df, you can select the column n using df[n], where n is the index of the column.

    Example:

    df = df.filter(df[3]!=0)
    

    will remove the rows of df, where the value in the fourth column is 0.

    Note that you can check the columns using df.printSchema()

    0 讨论(0)
提交回复
热议问题