Upacking a list to select multiple columns from a spark data frame

后端 未结 7 1105
隐瞒了意图╮
隐瞒了意图╮ 2020-12-07 14:03

I have a spark data frame df. Is there a way of sub selecting a few columns using a list of these columns?

scala> df.columns
res0: Array[Stri         


        
相关标签:
7条回答
  • 2020-12-07 14:25

    you can do like this

    String[] originCols = ds.columns();
    ds.selectExpr(originCols)
    

    spark selectExp source code

         /**
       * Selects a set of SQL expressions. This is a variant of `select` that accepts
       * SQL expressions.
       *
       * {{{
       *   // The following are equivalent:
       *   ds.selectExpr("colA", "colB as newName", "abs(colC)")
       *   ds.select(expr("colA"), expr("colB as newName"), expr("abs(colC)"))
       * }}}
       *
       * @group untypedrel
       * @since 2.0.0
       */
      @scala.annotation.varargs
      def selectExpr(exprs: String*): DataFrame = {
        select(exprs.map { expr =>
          Column(sparkSession.sessionState.sqlParser.parseExpression(expr))
        }: _*)
      }
    
    0 讨论(0)
  • 2020-12-07 14:26

    You can typecast String to spark column like this:

    import org.apache.spark.sql.functions._
    df.select(cols.map(col): _*)
    
    0 讨论(0)
  • 2020-12-07 14:28

    First convert the String Array to a List of Spark dataset Column type as below

    String[] strColNameArray = new String[]{"a", "b", "c", "d"};
    
    List<Column> colNames = new ArrayList<>();
    
    for(String strColName : strColNameArray){
        colNames.add(new Column(strColName));
    }
    

    then convert the List using JavaConversions functions within the select statement as below. You need the following import statement.

    import scala.collection.JavaConversions;
    
    Dataset<Row> selectedDF = df.select(JavaConversions.asScalaBuffer(colNames ));
    
    0 讨论(0)
  • 2020-12-07 14:29

    Yes , You can make use of .select in scala.

    Use .head and .tail to select the whole values mentioned in the List()

    Example

    val cols = List("b", "c")
    df.select(cols.head,cols.tail: _*)
    

    Explanation

    0 讨论(0)
  • 2020-12-07 14:30

    Another option that I've just learnt.

    import org.apache.spark.sql.functions.col
    val columns = Seq[String]("col1", "col2", "col3")
    val colNames = columns.map(name => col(name))
    val df = df.select(colNames:_*)
    
    0 讨论(0)
  • 2020-12-07 14:41

    You can pass arguments of type Column* to select:

    val df = spark.read.json("example.json")
    val cols: List[String] = List("a", "b")
    //convert string to Column
    val col: List[Column] = cols.map(df(_))
    df.select(col:_*)
    
    0 讨论(0)
提交回复
热议问题