Similar question as here, but don\'t have enough points to comment there.
According to the latest Spark documentation an udf
can be used in two differe
Spark >= 2.3
Scala-style udf
can be invoked directly:
import static org.apache.spark.sql.functions.*;
import org.apache.spark.sql.expressions.UserDefinedFunction;
UserDefinedFunction mode = udf(
(Seq<String> ss) -> ss.headOption(), DataTypes.StringType
);
df.select(mode.apply(col("vs"))).show();
Spark < 2.3
Even if we assume that your UDF is useful and cannot be replaced by a simple getItem
call it has incorrect signature. Array columns are exposed using Scala WrappedArray
not plain Java Arrays so you have to adjust the signature:
UDF1 mode = new UDF1<Seq<String>, String>() {
public String call(final Seq<String> types) throws Exception {
return types.headOption();
}
};
If UDF is already registered:
sqlContext.udf().register("mode", mode, DataTypes.StringType);
you can simply use callUDF (which is a new function introduced in 1.5) to call it by name:
df.select(callUDF("mode", col("vs"))).show();
You can also use it in selectExprs
:
df.selectExpr("mode(vs)").show();