问题
In the book "Programming In Scala", chapter 23, the author give an example like:
case class Book(title: String, authors: String*)
val books: List[Book] = // list of books, omitted here
// find all authors who have published at least two books
for (b1 <- books; b2 <- books if b1 != b2;
a1 <- b1.authors; a2 <- b2.authors if a1 == a2)
yield a1
The author said, this will translated into:
books flatMap (b1 =>
books filter (b2 => b1 != b2) flatMap (b2 =>
b1.authors flatMap (a1 =>
b2.authors filter (a2 => a1 == a2) map (a2 =>
a1))))
But if you look into the map and flatmap method definition(TraversableLike.scala), you may find, they are defined as for loops:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
b.sizeHint(this)
for (x <- this) b += f(x)
b.result
}
def flatMap[B, That](f: A => Traversable[B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
for (x <- this) b ++= f(x)
b.result
}
Well, I guess this for will continually be translated to foreach and then translated to while statement which is a construct not an expression, scala doesn't have a for construct, because it wants the for always yield something.
So, what I want to discuss with you is that, why does Scala do this "For translation" ?
The author's example used 4 generators, which will be translated into 4 level nested for loop in the end, I think it'll have really horrible performance when the books
is large.
Scala encourage people to use this kind of "Syntactic Sugar", you can always see codes that heavily make use of filter, map and flatmap, which seems programmers are forgetting what they really do is nesting one loop inside another, and what achieved is only to make codes looks a bit shorter. What's your idea?
回答1:
For comprehensions are syntactic sugar for monadic transformation, and, as such, are useful in all sorts of places. At that, they are much more verbose in Scala than the equivalent Haskell construct (of course, Haskell is non-strict by default, so one can't talk about performance of the construct like in Scala).
Also important, this construct keeps what is being done clear, and avoids quickly escalating indentation or unnecessary private method nesting.
As to the final consideration, whether that hides the complexity or not, I'll posit this:
for {
b1 <- books
b2 <- books
if b1 != b2
a1 <- b1.authors
a2 <- b2.authors
if a1 == a2
} yield a1
It is very easy to see what is being done, and the complexity is clear: b^2 * a^2 (the filter won't alter the complexity), for number of books and number of authors. Now, write the same code in Java, either with deep indentation or with private methods, and try to ascertain, in a quick look, what the complexity of the code is.
So, imho, this doesn't hide the complexity, but, on the contrary, makes it clear.
As for the map
/flatMap
/filter
definitions you mention, they do not belong to List
or any other class, so they won't be applied. Basically,
for(x <- List(1, 2, 3)) yield x * 2
is translated into
List(1, 2, 3) map (x => x * 2)
and that is not the same thing as
map(List(1, 2, 3), ((x: Int) => x * 2)))
which is how the definition you passed would be called. For the record, the actual implementation of map
on List
is:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
val b = bf(repr)
b.sizeHint(this)
for (x <- this) b += f(x)
b.result
}
回答2:
I write code so that it's easy to understand and maintain. I then profile. If there's a bottleneck that's where I devote my attention. If it's in something like you've described I'll attack the problem in a different manner. Until then, I love the "sugar." It saves me the trouble of writing things out or thinking hard about it.
回答3:
There are actually 6 loops. One loop for each filter/flatMap/map
The filter->map pairs can be done in one loop by using lazy views of the collections (iterator method)
In general, tt is running 2 nested loops for books to find all book pairs and then two nested loops to find if the author of one book is in the list of authors of the other.
Using simple data structures, you would do the same when coding explicitly.
And of course, the example here is to show a complex 'for' loop, not to write the most efficient code. E.g., instead of a sequence of authors, one could use a Set and then find if the intersection is non empty:
for (b1 <- books; b2 <- books; a <- (b1.authors & b2.authors)) yield a
回答4:
Note that in 2.8, the filter
call was changed to withFilter
which is lazy and would avoid constructing an intermediate structure. See guide to move from filter to withFilter?.
I believe the reason that for
is translated to map
, flatMap
and withFilter
(as well as value definitions if present) is to make the use of monads easier.
In general I think if the computation you are doing involves looping 4 times, it is fine using the for
loop. If the computation can be done more efficiently and performance is important then you should use the more efficient algorithm.
回答5:
One follow-up to @IttayD's answer on the algorithm's efficiency. It's worth noting that the algorithm in the original post (and in the book) is a nested loop join. In practice, this isn't an efficient algorithm for large datasets, and most databases would use a hash aggregate here instead. In Scala, a hash aggregate would look something like:
(for (book <- books;
author <- book.authors) yield (book, author)
).groupBy(_._2).filter(_._2.size > 1).keys
来源:https://stackoverflow.com/questions/4211103/scala-for-comprehension-efficiency