I am trying to filter inside map function. Basically the way I\'ll do that in classic map-reduce is mapper wont write anything to context when filter criteria meet. How can I ac
There are few options:
rdd.flatMap
: rdd.flatMap
will flatten a Traversable
collection into the RDD. To pick elements, you'll typically return an Option
as result of the transformation.
rdd.flatMap(elem => if (filter(elem)) Some(f(elem)) else None)
rdd.collect(pf: PartialFunction)
allows you to provide a partial function that can filter and transform elements from the original RDD. You can use all power of pattern matching with this method.
rdd.collect{case t if (cond(t)) => f(t)}
rdd.collect{case t:GivenType => f(t)}
As Dean Wampler mentions in the comments, rdd.map(f(_)).filter(cond(_))
might be as good and even faster than the other more 'terse' options mentioned above.
Where f
is a transformation (or map) function.