问题
I'm trying to create a Graph using some Google Web Graph data which can be found here:
https://snap.stanford.edu/data/web-Google.html
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
val textFile = sc.textFile("hdfs://n018-data.hursley.ibm.com/user/romeo/web-Google.txt")
val arrayForm = textFile.filter(_.charAt(0)!='#').map(_.split("\\s+")).cache()
val nodes = arrayForm.flatMap(array => array).distinct().map(_.toLong)
val edges = arrayForm.map(line => Edge(line(0).toLong,line(1).toLong))
val graph = Graph(nodes,edges)
Unfortunately, I get this error:
<console>:27: error: type mismatch;
found : org.apache.spark.rdd.RDD[Long]
required: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId, ?)]
Error occurred in an application involving default arguments.
val graph = Graph(nodes,edges)
So how can I create a VertexId object? For my understanding it should be sufficient to pass a Long.
Any ideas?
Thanks a lot!
romeo
回答1:
Not exactly. If you take a look at the signature of the apply
method of the Graph
object you'll see something like this (for a full signature see API docs):
apply[VD, ED](
vertices: RDD[(VertexId, VD)], edges: RDD[Edge[ED]], defaultVertexAttr: VD)
As you can read in a description:
Construct a graph from a collection of vertices and edges with attributes.
Because of that you cannot simply pass RDD[Long]
as a vertices
argument ( RDD[Edge[Nothing]]
as edges
won't work either).
import scala.{Option, None}
val nodes: RDD[(VertexId, Option[String])] = arrayForm.
flatMap(array => array).
map((_.toLong, None))
val edges: RDD[Edge[String]] = arrayForm.
map(line => Edge(line(0).toLong, line(1).toLong, ""))
Note that:
Duplicate vertices are picked arbitrarily
so .distinct()
on nodes
is obsolete in this case.
If you want to create a Graph
without attributes you can use Graph.fromEdgeTuples
.
回答2:
The error message said that nodes
must be type of RDD[(Long, anything else)]
. The first element in tuple is vertexId and the second element could anything, for example, String with node description. Try to simply repeat vertexId:
val nodes = arrayForm
.flatMap(array => array)
.distinct()
.map(x =>(x.toLong, x.toLong))
来源:https://stackoverflow.com/questions/31189092/how-to-create-a-vertexid-in-apache-spark-graphx-using-a-long-data-type