How to create a VertexId in Apache Spark GraphX using a Long data type?

喜欢而已 提交于 2019-12-05 13:09:57

Not exactly. If you take a look at the signature of the apply method of the Graph object you'll see something like this (for a full signature see API docs):

apply[VD, ED](
    vertices: RDD[(VertexId, VD)], edges: RDD[Edge[ED]], defaultVertexAttr: VD)

As you can read in a description:

Construct a graph from a collection of vertices and edges with attributes.

Because of that you cannot simply pass RDD[Long] as a vertices argument ( RDD[Edge[Nothing]] as edges won't work either).

import scala.{Option, None}

val nodes: RDD[(VertexId, Option[String])] = arrayForm.
    flatMap(array => array).
    map((_.toLong, None))

val edges: RDD[Edge[String]] = arrayForm.
    map(line => Edge(line(0).toLong, line(1).toLong, ""))

Note that:

Duplicate vertices are picked arbitrarily

so .distinct() on nodes is obsolete in this case.

If you want to create a Graph without attributes you can use Graph.fromEdgeTuples.

The error message said that nodes must be type of RDD[(Long, anything else)]. The first element in tuple is vertexId and the second element could anything, for example, String with node description. Try to simply repeat vertexId:

val nodes = arrayForm
             .flatMap(array => array)
             .distinct()
             .map(x =>(x.toLong, x.toLong))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!