I have the following types in a dataframe:
root
|-- id: string (nullable = true)
|-- items: array (nullable = true)
You code is almost right. All you have to do is replace List
with Seq
def filterItems(flist: List[String]) = udf {
(recs: Seq[String]) => recs.filter(item => flist.contains(item))
}
It would also make sense to change signature from List[String] => UserDefinedFunction
to SeqString] => UserDefinedFunction
, but it is not required.
Reference SQL Programming Guide - Data Types.