I am using pyspark 2.3.1 and would like to filter array elements with an expression and not an using udf:
>>> df = spark.createDataFrame([(1, \"A\",
Spark < 2.4
There is no *reasonable replacement for udf
in PySpark.
Spark >= 2.4
Your code:
expr("filter(col3, x -> x >= 3)")
can be used as is.
Reference
Querying Spark SQL DataFrame with complex types
* Given the cost of exploding or converting to and from RDD udf
is almost exclusively preferable.