Use “IS IN” between 2 Spark dataframe columns

后端 未结 1 1777
春和景丽
春和景丽 2021-01-23 06:37

I have the above dataframe:

from pyspark.sql.types import *

rdd = sc.parallelize([
        (\'ALT\', [\'chien\', \'chat\'         


        
相关标签:
1条回答
  • 2021-01-23 07:10

    You can use array_contains:

    from pyspark.sql.functions import expr
    
    test.withColumn("isinlist", expr("array_contains(Animaux, Animal)")).show()
    # +--------+---------------+------+--------+
    # |ClientId|        Animaux|Animal|isinlist|
    # +--------+---------------+------+--------+
    # |     ALT|  [chien, chat]|oiseau|   false|
    # |     ALT|       [oiseau]|oiseau|    true|
    # |     TDR|[poule, poulet]| poule|    true|
    # |     ALT|         [ours]| chien|   false|
    # |     ALT|         [paon]| tigre|   false|
    # |     TDR|  [tigre, lion]|  lion|    true|
    # |     ALT|         [chat]| chien|   false|
    # +--------+---------------+------+--------+
    

    Source How to filter Spark dataframe if one column is a member of another column by zero323 (Scala).

    0 讨论(0)
提交回复
热议问题