问题
there is my program:
static class Vprog extends AbstractFunction3< Object, OddRange, OddRange, OddRange> implements Serializable {
@Override
public OddRange apply(Object l, OddRange self, OddRange sumOdd) {
System.out.println(self.getS()+self.getI()+" ---> "+sumOdd.getS()+sumOdd.getI());
self.setS(sumOdd.getS() + self.getS());
self.setI(self.getI() + sumOdd.getI());
return new OddRange(self.getS(), self.getI());
}
}
the question is if I use return new OddRange like above in class Vprog,I can change the vertexRDD
But, if I use retuen self, like:
static class Vprog extends AbstractFunction3< Object, OddRange, OddRange, OddRange> implements Serializable {
@Override
public OddRange apply(Object l, OddRange self, OddRange sumOdd) {
System.out.println(self.getS()+self.getI()+" ---> "+sumOdd.getS()+sumOdd.getI());
self.setS(sumOdd.getS() + self.getS());
self.setI(self.getI() + sumOdd.getI());
return self;
}
}
The vertexRDD didn't change. I know RDD is immutable, but how can I update the vectexRDD in spark.graphx.pregel correctly?Can you give me any advise?
I have found the same question: Spark Pregel is not working with Java But I use spark 2.3.0,maybe it have the same problem?
回答1:
I think I have found the answer:
We must return a new one, if we wanna change the data which will be used in next sendMsg in Vprog.
that's because Vprog changes the vertexRDD, but sendMsg uses the tripletsRDD. And what's more, the verteies in the tripletsRDD are not equels to vertexRDD, it's just a copy of vertexRDD. So,the problem is when to update the verteies in tripletsRDD when vertexRDD is changed.
We can follow the source below to find out the reason:
first part:pregel(in Pregel.scala)->joinVertices(in GraphOps.scala)->outerJoinVertices(in GraphImpl.scala)->diff(in VertexRddImpl.scala)
And then:
second part:pregel(in Pregel.scala)->mapReduceTriplets(in GraphXUtils.scala)->aggregateMessagesWithActiveSet(in GraphImpl.scala).
In first part, I found that Vprog will compare the VertexRDD data before and after execution. SO, if it is modified on the source data, they will be the same. Then a data structure named replicatedVertexView will be generated to store different VertexRDD info. If they are same, nothing will be stored.
In second part, it will update the tripletsRDD with the infomations which stored in the relicatedVertexView. And then, use the tripletsRDD in sendMsg.
So, if we don't return new in Vprog, the tripletsRDD will not be changed with VertexRDD, and the results will be wrong.
来源:https://stackoverflow.com/questions/61380532/whats-the-difference-between-change-input-arguments-and-creating-a-new-object-i