Spark Dataframe select based on column index

前端 未结 3 1504
面向向阳花
面向向阳花 2021-02-06 03:53

How do I select all the columns of a dataframe that has certain indexes in Scala?

For example if a dataframe has 100 columns and i want to extract only columns (10,12,13

3条回答
  •  无人及你
    2021-02-06 04:15

    @user6910411's answer above works like a charm and the number of tasks/logical plan is similar to my approach below. BUT my approach is a bit faster.
    So,
    I would suggest you to go with the column names rather than column numbers. Column names are much safer and much ligher than using numbers. You can use the following solution :

    val colNames = Seq("col1", "col2" ...... "col99", "col100")
    
    val selectColNames = Seq("col1", "col3", .... selected column names ... )
    
    val selectCols = selectColNames.map(name => df.col(name))
    
    df = df.select(selectCols:_*)
    

    If you are hesitant to write all the 100 column names then there is a shortcut method too

    val colNames = df.schema.fieldNames
    

提交回复
热议问题