I have a PySpark dataframe, trips, on which I am performing aggregations. For each PULocationID, I am first computing the average of total_amount
trips
PULocationID
total_amount
Here is the line of code that solved the problem:
cnt_cond(col('DOLocationID').isin([i['DOLocationID'] for i in mtrips.collect()])).alias('trips_to_pop')