Spark GraphX: add multiple edge weights

后端 未结 1 1401
情书的邮戳
情书的邮戳 2021-01-13 14:47

I am new to GraphX and have a Spark dataframe with four columns like below:

src_ip    dst_ip    flow_count   sum_byt         


        
相关标签:
1条回答
  • 2021-01-13 15:26

    It is possible to add both variables to the edge. The simplest solution would be to use a tuple, for example:

    val data = Array(Edge(3L, 7L, (123, 456)), Edge(5L, 3L, (41, 34)))
    val edges: RDD[Edge[(Int, Int)]] = spark.sparkContext.parallelize(data)
    

    Alternatively, you can make use of a case class:

    case class EdgeWeight(flow_count: Int, sum_bytes: Int)
    
    val data2 = Array(Edge(3L, 7L, EdgeWeight(123, 456)), Edge(5L, 3L, EdgeWeight(41, 34)))
    val edges: RDD[Edge[EdgeWeight]] = spark.sparkContext.parallelize(data2)
    

    Using a case class would be more convenient to use and maintain if there are more attributes to be added.


    I believe that in this specific case, it is most elegantly solved by:

    val trafficEdges = trafficsFromTo.map{x => 
      Edge(MurmurHash3.stringHash(x(0).toString, 
           MurmurHash3.stringHash(x(1).toString,
           EdgeWeight(x(2), x(3))
    }
    
    trafficEdges.sortBy(edge => edge.attr.flow_count) // sort by flow_count
    
    0 讨论(0)
提交回复
热议问题