spark - filter within map

后端 未结 1 1842
忘了有多久
忘了有多久 2021-02-08 04:53

I am trying to filter inside map function. Basically the way I\'ll do that in classic map-reduce is mapper wont write anything to context when filter criteria meet. How can I ac

1条回答
  •  无人及你
    2021-02-08 05:23

    There are few options:

    rdd.flatMap: rdd.flatMap will flatten a Traversable collection into the RDD. To pick elements, you'll typically return an Option as result of the transformation.

    rdd.flatMap(elem => if (filter(elem)) Some(f(elem)) else None)
    

    rdd.collect(pf: PartialFunction) allows you to provide a partial function that can filter and transform elements from the original RDD. You can use all power of pattern matching with this method.

    rdd.collect{case t if (cond(t)) => f(t)}
    rdd.collect{case t:GivenType => f(t)}
    

    As Dean Wampler mentions in the comments, rdd.map(f(_)).filter(cond(_)) might be as good and even faster than the other more 'terse' options mentioned above.

    Where f is a transformation (or map) function.

    0 讨论(0)
提交回复
热议问题