How do I call a UDF on a Spark DataFrame using JAVA?

前端 未结 1 812
迷失自我
迷失自我 2020-12-01 12:49

Similar question as here, but don\'t have enough points to comment there.

According to the latest Spark documentation an udf can be used in two differe

相关标签:
1条回答
  • 2020-12-01 13:13

    Spark >= 2.3

    Scala-style udf can be invoked directly:

    import static org.apache.spark.sql.functions.*;
    import org.apache.spark.sql.expressions.UserDefinedFunction;
    
    UserDefinedFunction mode = udf(
      (Seq<String> ss) -> ss.headOption(), DataTypes.StringType
    );
    
    df.select(mode.apply(col("vs"))).show();
    

    Spark < 2.3

    Even if we assume that your UDF is useful and cannot be replaced by a simple getItem call it has incorrect signature. Array columns are exposed using Scala WrappedArray not plain Java Arrays so you have to adjust the signature:

    UDF1 mode = new UDF1<Seq<String>, String>() {
      public String call(final Seq<String> types) throws Exception {
        return types.headOption();
      }
    };
    

    If UDF is already registered:

    sqlContext.udf().register("mode", mode, DataTypes.StringType);
    

    you can simply use callUDF (which is a new function introduced in 1.5) to call it by name:

    df.select(callUDF("mode", col("vs"))).show();
    

    You can also use it in selectExprs:

    df.selectExpr("mode(vs)").show();
    
    0 讨论(0)
提交回复
热议问题