问题
given a list of documents, I want to obtain the pairs that shares at least one token. To do this I wrote the code below, that do that through an inverted index.
object TestFlatMap {
case class Document(id : Int, tokens : List[String])
def main(args: Array[String]): Unit = {
val documents = List(
Document(1, List("A", "B", "C", "D")),
Document(2, List("A", "B", "E", "F", "G")),
Document(3, List("E", "G", "H")),
Document(4, List("A", "L", "M", "N"))
)
val expectedTokensIds = List(("A",1), ("A",2), ("A",4), ("B",1), ("B",2), ("C",1), ("D",1), ("E",2), ("E",3), ("F",2), ("G",2), ("G",3), ("H",3), ("L",4), ("M",4), ("N",4)) //Expected tokens - id tuples
val expectedCouples = Set((1, 2), (1, 4), (2, 3), (2, 4)) //Expected resulting pairs
/**
* For each token returns the id of the documents that contains it
* */
val tokensIds = documents.flatMap{ document =>
document.tokens.map{ token =>
(token, document.id)
}
}
//Check if the tuples are right
assert(tokensIds.length == expectedTokensIds.length && tokensIds.intersect(expectedTokensIds).length == expectedTokensIds.length, "Error: tokens-ids not matches")
//Group the documents by the token
val docIdsByToken = tokensIds.groupBy(_._1).filter(_._2.size > 1)
/**
* For each group of documents generate the pairs
* */
val couples = docIdsByToken.map{ case (token, docs) =>
docs.combinations(2).map{ c =>
val d1 = c.head._2
val d2 = c.last._2
if(d1 < d2){
(d1, d2)
}
else{
(d2, d1)
}
}
}.flatten.toSet
/**
* Same operation, but with flatMap
* For each group of documents generate the pairs
* */
val couples1 = docIdsByToken.flatMap{ case (token, docs) =>
docs.combinations(2).map{ c =>
val d1 = c.head._2
val d2 = c.last._2
if(d1 < d2){
(d1, d2)
}
else{
(d2, d1)
}
}
}.toSet
//The results obtained with flatten pass the test
assert(couples.size == expectedCouples.size && couples.intersect(expectedCouples).size == expectedCouples.size, "Error: couples not matches")
//The results obtained with flatMap do not pass the test: they are wrong
assert(couples1.size == expectedCouples.size && couples1.intersect(expectedCouples).size == expectedCouples.size, "Error: couples1 not matches")
}
The problem is that the flatMap that should generates the final results does not works properly, it only returns two couples: (2,3) and (1,2). I do not understand why it does not works, moreover IntelliJ suggests me to use flatMap instead of use map an then flatten.
Someone is able to explain me where the problem is? Because I cannot figure out, I also had this problem in past.
Thanks
Luca
回答1:
That's an excellent example that demonstrates that all the nice monad laws do not necessarily hold if you switch between different types of collections during the map
/flatMap
/flatten
.
You must convert the Map
to List
, so that keys are not overridden repeatedly while you are constructing another Map
as an intermediate result, because a Map
will override keys, instead of collecting all pairs:
val couples1 = docIdsByToken.toList.flatMap{ case (token, docs) =>
docs.combinations(2).map{ c =>
val d1 = c.head._2
val d2 = c.last._2
if(d1 < d2){
(d1, d2)
}
else{
(d2, d1)
}
}
}.toSet
Here is a much shorter version that demonstrates the same problem:
val m = Map("A" -> (2, 1), "B" -> (2, 3))
val s = m.flatMap{ case (k, v) => List(v) }.toSet
println(s)
Instead of Set((2, 1), (2, 3))
, it will produce Set((2, 3))
, because
after the flatMap
and before the toSet
the intermediate result is a new Map
, and this map can hold only one value for key = 2
.
The difference to the first version is that after map
, you obtain something like an Iterable[List[(Int, Int)]]
, which is not a Map
, and therefore cannot lose/override any keys.
来源:https://stackoverflow.com/questions/49127313/scala-flatmap-provides-wrong-results