how to populate select clause of dataframe dynamically? giving AnalysisException

前端 未结 3 881
终归单人心
终归单人心 2020-12-12 06:12

I am Using spark-sql 2.4.1 and java 8.

 val country_df = Seq(
    (\"us\",2001),
    (\"fr\",2002),
    (\"jp\",2002),
    (\"in\",2001),
    (\"fr\",2003),
         


        
相关标签:
3条回答
  • 2020-12-12 06:17
    scala> val colname = col_df.rdd.collect.toList.map(x => x(0).toString).toSeq
    
    scala> data_df.select(colname.head, colname.tail: _*).show()
    +----------+----------+
    |        us|        in|
    +----------+----------+
    |us_state_1|in_state_1|
    |us_state_2|in_state_2|
    |us_state_3|in_state_3|
    +----------+----------+
    
    0 讨论(0)
  • 2020-12-12 06:24

    You can try with the below code.

    Select the column name from the first dataset.

    List<String> columns = country_df.select("country").where($"data_yr" === 2001).as(Encoders.STRING()).collectAsList();
    

    Use the column names in selectexpr in second dataset.

    public static Seq<String> convertListToSeq(List<String> inputList) {
            return JavaConverters.asScalaIteratorConverter(inputList.iterator()).asScala().toSeq();
    }
    
    
    //using selectExpr
    data_df.selectExpr(convertListToSeq(columns)).show(true);
    
    0 讨论(0)
  • 2020-12-12 06:34

    Using pivot you can get the values as column names directly like this:

    val selectCols = col_df.groupBy().pivot($"country").agg(lit(null)).columns
    data_df.select(selectCols.head, selectCols.tail: _*)
    
    0 讨论(0)
提交回复
热议问题