Filter array column content

后端 未结 1 646
醉酒成梦
醉酒成梦 2021-01-13 13:22

I am using pyspark 2.3.1 and would like to filter array elements with an expression and not an using udf:

>>> df = spark.createDataFrame([(1, \"A\",         


        
相关标签:
1条回答
  • 2021-01-13 13:58

    Spark < 2.4

    There is no *reasonable replacement for udf in PySpark.

    Spark >= 2.4

    Your code:

    expr("filter(col3, x -> x >= 3)")
    

    can be used as is.

    Reference

    Querying Spark SQL DataFrame with complex types


    * Given the cost of exploding or converting to and from RDD udf is almost exclusively preferable.

    0 讨论(0)
提交回复
热议问题