PYSPARK: how to visualize a GraphFrame?

微笑、不失礼 提交于 2019-12-02 07:57:04

I couldn't find any native GraphFrame library that visualizes data either.

Nevertheless, you could try to do it from DataBricks with the display() function. You can see an example here.

Also, you can try to transform the GraphFrame to python lists and use the matplotlib or the Pygraphviz libraries.

Using Python/PySpark/Jupyter I am using the draw functionality from the networkx library. The trick is to create a networkx graph from the grapheframe graph

import networkx as nx
from graphframes import GraphFrame

def PlotGraph(edge_list):
    Gplot=nx.Graph()
    for row in edge_list.select('src','dst').take(1000):
        Gplot.add_edge(row['src'],row['dst'])

    plt.subplot(121)
    nx.draw(Gplot, with_labels=True, font_weight='bold')


spark = SparkSession \
    .builder \
    .appName("PlotAPp") \
    .getOrCreate()

sqlContext = SQLContext(spark)

vertices = sqlContext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
("e1", "Esther2", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60),
    ("h", "Mark", 61),
    ("i", "Gunter", 62),
    ("j", "Marit", 63)], ["id", "name", "age"])

edges = sqlContext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "a", "follow"),
  ("c", "a", "follow"),
  ("c", "f", "follow"),
  ("g", "h", "follow"),
  ("h", "i", "friend"),
  ("h", "j", "friend"),
  ("j", "h", "friend"),
    ("e", "e1", "friend")
], ["src", "dst", "relationship"])

g = GraphFrame(vertices, edges)
PlotGraph(g.edges)

plot of some graph

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!