How do I call a UDF on a Spark DataFrame using JAVA?

前端未结

关注

 1  812

迷失自我

Similar question as here, but don\'t have enough points to comment there.

According to the latest Spark documentation an udf can be used in two differe

相关标签:

1条回答

我在风中等你

2020-12-01 13:13

Spark >= 2.3

Scala-style udf can be invoked directly:

import static org.apache.spark.sql.functions.*;
import org.apache.spark.sql.expressions.UserDefinedFunction;

UserDefinedFunction mode = udf(
  (Seq<String> ss) -> ss.headOption(), DataTypes.StringType
);

df.select(mode.apply(col("vs"))).show();

Spark < 2.3

Even if we assume that your UDF is useful and cannot be replaced by a simple getItem call it has incorrect signature. Array columns are exposed using Scala WrappedArray not plain Java Arrays so you have to adjust the signature:

UDF1 mode = new UDF1<Seq<String>, String>() {
  public String call(final Seq<String> types) throws Exception {
    return types.headOption();
  }
};

If UDF is already registered:

sqlContext.udf().register("mode", mode, DataTypes.StringType);

you can simply use callUDF (which is a new function introduced in 1.5) to call it by name:

df.select(callUDF("mode", col("vs"))).show();

You can also use it in selectExprs:

df.selectExpr("mode(vs)").show();

0 讨论(0)