Can there be any reason to prefer filter+map
:
list.filter (i => aCondition(i)).map(i => fun(i))
over collect
?
Most of Scala's collections eagerly apply operations and (unless you're using a macro library that does this for you) will not fuse operations. So filter
followed by map
will usually create two collections (and even if you use Iterator
or somesuch, the intermediate form will be transiently created, albeit only an element at a time), whereas collect
will not.
On the other hand, collect
uses a partial function to implement the joint test, and partial functions are slower than predicates (A => Boolean
) at testing whether something is in the collection.
Additionally, there can be cases where it is simply clearer to read one than the other and you don't care about performance or memory usage differences of a factor of 2 or so. In that case, use whichever is clearer. Generally if you already have the functions named, it's clearer to read
xs.filter(p).map(f)
xs.collect{ case x if p(x) => f(x) }
but if you are supplying the closures inline, collect
generally looks cleaner
xs.filter(x < foo(x, x)).map(x => bar(x, x))
xs.collect{ case x if foo(x, x) => bar(x, x) }
even though it's not necessarily shorter, because you only refer to the variable once.
Now, how big is the difference in performance? That varies, but if we consider a a collection like this:
val v = Vector.tabulate(10000)(i => ((i%100).toString, (i%7).toString))
and you want to pick out the second entry based on filtering the first (so the filter and map operations are both really easy), then we get the following table.
Note: one can get lazy views into collections and gather operations there. You don't always get your original type back, but you can always use to
get the right collection type. So xs.view.filter(p).map(f).toVector
would, because of the view, not create an intermediate. That is tested below also. It has also been suggested that one can xs.flatMap(x => if (p(x)) Some(f(x)) else None)
and that this is efficient. That is not so. It's also tested below. And one can avoid the partial function by explicitly creating a builder: val vb = Vector.newBuilder[String]; xs.foreach(x => if (p(x)) vb += f(x)); vb.result
, and the results for that are also listed below.
In the table below, three conditions have been tested: filter out nothing, filter out half, filter out everything. The times have been normalized to filter/map (100% = same time as filter/map, lower is better). Error bounds are around +- 3%.
Performance of different filter/map alternatives
====================== Vector ========================
filter/map collect view filt/map flatMap builder
100% 44% 64% 440% 30% filter out none
100% 60% 76% 605% 42% filter out half
100% 112% 103% 1300% 74% filter out all
Thus, filter/map
and collect
are generally pretty close (with collect
winning when you keep a lot), flatMap
is far slower under all situations, and creating a builder always wins. (This is true specifically for Vector
. Other collections may have somewhat different characteristics, but the trends for most will be similar because the differences in operations are similar.) Views in this test tend to be a win, but they don't always work seamlessly (and they aren't really better than collect
except for the empty case).
So, bottom line: prefer filter
then map
if it aids clarity when speed doesn't matter, or prefer it for speed when you're filtering out almost everything but still want to keep things functional (so don't want to use a builder); and otherwise use collect
.