PYSPARK: how to visualize a GraphFrame?

喜欢而已 提交于 2020-01-11 11:48:07

问题


Suppose that I have created the following graph. My question is how can I visualize it?

 # Create a Vertex DataFrame with unique ID column "id"
    v = sqlContext.createDataFrame([
      ("a", "Alice", 34),
      ("b", "Bob", 36),
      ("c", "Charlie", 30),
    ], ["id", "name", "age"])
    # Create an Edge DataFrame with "src" and "dst" columns
    e = sqlContext.createDataFrame([
      ("a", "b", "friend"),
      ("b", "c", "follow"),
      ("c", "b", "follow"),
    ], ["src", "dst", "relationship"])
    # Create a GraphFrame
    from graphframes import *
    g = GraphFrame(v, e)

回答1:


I couldn't find any native GraphFrame library that visualizes data either.

Nevertheless, you could try to do it from DataBricks with the display() function. You can see an example here.

Also, you can try to transform the GraphFrame to python lists and use the matplotlib or the Pygraphviz libraries.




回答2:


Using Python/PySpark/Jupyter I am using the draw functionality from the networkx library. The trick is to create a networkx graph from the grapheframe graph

import networkx as nx
from graphframes import GraphFrame

def PlotGraph(edge_list):
    Gplot=nx.Graph()
    for row in edge_list.select('src','dst').take(1000):
        Gplot.add_edge(row['src'],row['dst'])

    plt.subplot(121)
    nx.draw(Gplot, with_labels=True, font_weight='bold')


spark = SparkSession \
    .builder \
    .appName("PlotAPp") \
    .getOrCreate()

sqlContext = SQLContext(spark)

vertices = sqlContext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
("e1", "Esther2", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60),
    ("h", "Mark", 61),
    ("i", "Gunter", 62),
    ("j", "Marit", 63)], ["id", "name", "age"])

edges = sqlContext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "a", "follow"),
  ("c", "a", "follow"),
  ("c", "f", "follow"),
  ("g", "h", "follow"),
  ("h", "i", "friend"),
  ("h", "j", "friend"),
  ("j", "h", "friend"),
    ("e", "e1", "friend")
], ["src", "dst", "relationship"])

g = GraphFrame(vertices, edges)
PlotGraph(g.edges)

plot of some graph



来源:https://stackoverflow.com/questions/45720931/pyspark-how-to-visualize-a-graphframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!