I have the above dataframe:
from pyspark.sql.types import *
rdd = sc.parallelize([
(\'ALT\', [\'chien\', \'chat\'
You can use array_contains
:
from pyspark.sql.functions import expr
test.withColumn("isinlist", expr("array_contains(Animaux, Animal)")).show()
# +--------+---------------+------+--------+
# |ClientId| Animaux|Animal|isinlist|
# +--------+---------------+------+--------+
# | ALT| [chien, chat]|oiseau| false|
# | ALT| [oiseau]|oiseau| true|
# | TDR|[poule, poulet]| poule| true|
# | ALT| [ours]| chien| false|
# | ALT| [paon]| tigre| false|
# | TDR| [tigre, lion]| lion| true|
# | ALT| [chat]| chien| false|
# +--------+---------------+------+--------+
Source How to filter Spark dataframe if one column is a member of another column by zero323 (Scala).