I have a spark data frame df
. Is there a way of sub selecting a few columns using a list of these columns?
scala> df.columns
res0: Array[Stri
you can do like this
String[] originCols = ds.columns();
ds.selectExpr(originCols)
spark selectExp source code
/**
* Selects a set of SQL expressions. This is a variant of `select` that accepts
* SQL expressions.
*
* {{{
* // The following are equivalent:
* ds.selectExpr("colA", "colB as newName", "abs(colC)")
* ds.select(expr("colA"), expr("colB as newName"), expr("abs(colC)"))
* }}}
*
* @group untypedrel
* @since 2.0.0
*/
@scala.annotation.varargs
def selectExpr(exprs: String*): DataFrame = {
select(exprs.map { expr =>
Column(sparkSession.sessionState.sqlParser.parseExpression(expr))
}: _*)
}
You can typecast String to spark column like this:
import org.apache.spark.sql.functions._
df.select(cols.map(col): _*)
First convert the String Array to a List of Spark dataset Column type as below
String[] strColNameArray = new String[]{"a", "b", "c", "d"};
List<Column> colNames = new ArrayList<>();
for(String strColName : strColNameArray){
colNames.add(new Column(strColName));
}
then convert the List using JavaConversions functions within the select statement as below. You need the following import statement.
import scala.collection.JavaConversions;
Dataset<Row> selectedDF = df.select(JavaConversions.asScalaBuffer(colNames ));
Yes , You can make use of .select in scala.
Use .head and .tail to select the whole values mentioned in the List()
Example
val cols = List("b", "c")
df.select(cols.head,cols.tail: _*)
Explanation
Another option that I've just learnt.
import org.apache.spark.sql.functions.col
val columns = Seq[String]("col1", "col2", "col3")
val colNames = columns.map(name => col(name))
val df = df.select(colNames:_*)
You can pass arguments of type Column*
to select
:
val df = spark.read.json("example.json")
val cols: List[String] = List("a", "b")
//convert string to Column
val col: List[Column] = cols.map(df(_))
df.select(col:_*)