Visualizing Undirected Graph That's Too Large for GraphViz?

前端 未结 17 806
自闭症患者
自闭症患者 2020-11-30 17:18

I need advice for rendering an undirected graph with 178,000 nodes and 500,000 edges. I\'ve tried Neato, Tulip, and Cytoscape. Neato doesn\'t even come remotely close, and

相关标签:
17条回答
  • 2020-11-30 17:49

    I suggest that you first do some preprocessing of the data, for example collapsing nodes to clusters and then visualizing the clusters. Collapsing will reduce the number of nodes and makes it easier for algorithms such as Kamada-Kawai or Fruchterman-Reingold to render the resulting graph.

    If you really need to visualize 500.000 nodes then can you consider using a simple circular layout. This will be easy to render without the issues that force-based algorithms have. Take a look at Circos: http://mkweb.bcgsc.ca/circos/

    Circos is graph visualization developed by bio-informatics people which is tailored to visualize genomes and other extremely large and complex data-sets.

    It's a PERL based package, I hope that's not problematic.

    0 讨论(0)
  • 2020-11-30 17:49

    First, I would like to second aliekens' suggestion to try sfdp. It is the large scale version of Neato.

    As OJW suggests you could also just plot the nodes in R2. Your edges actually supply what he calls a "natural ordering." In particular you can plot the components of the second and third eigenvectors of the normalized graph Laplacian. This is the matrix L in this wikipedia page about spectral clustering. You should be able to write down this matrix without understanding the linear algebra behind it. Then, you have reduced your problem to approximately computing the first few eigenvectors of a large sparse matrix. This is traditionally done by iterative methods and is implemented in standard linear algebra packages. This method should scale up to very large graphs.

    0 讨论(0)
  • 2020-11-30 17:52

    Large Graph Layout (LGL) project helped me a lot with a similar ptoblem. It handles layout and have a small java app to draw produced layouts in 2D. No vector output out of the box so you'll have to draw the graph yourself (given the node coordinates produced by LGL)

    0 讨论(0)
  • 2020-11-30 17:55

    You can also try NAViGaTOR (disclosure: I'm one of the developers for that software). We've successfully visualized graphs with as many as 1.7 million edges with it. Although such large networks are hard to manipulate (the user interface will get laggy). However, it does use OpenGL for the visualization so some of the overhead is transferred to the graphics card.

    Also note that you'll have to crank up the memory settings in the File->Preferences dialog box before you can successfully open a network that big.

    Finally, as most of the other responses point out, you are better off re-organizing your data into something smaller and more meaningful.

    0 讨论(0)
  • 2020-11-30 17:56

    Mathematica could very likely handle it, but I have to admit my first reaction was along the lines of the comment that said "take a piece of paper and color it black." Is there no way to reduce the density of the graph?

    A possible issue is that you seem to be looking for layout, not just rendering. I have no knowledge about the Big O characteristics of the layouts implemented by various tools, but intuitively I would guess that it might take a long time to lay out that much data.

    0 讨论(0)
  • 2020-11-30 17:57

    I've had good results using the graph-tool library in python. The below graph has 1,490 nodes and 19,090 edges - it took around 5min to render on my laptop.

    political blogging network

    The graph data comes from the political blogging network described by Adamic and Glance in “The political blogosphere and the 2004 US Election” pdf link here. If you zoom in you can see the blog urls for each node.

    zoomed

    Here's the code I used to draw it (blog http://ryancompton.net/2014/10/22/stochastic-block-model-based-edge-bundles-in-graph-tool/ ):

    import graph_tool.all as gt
    import math
    
    g = gt.collection.data["polblogs"] #  http://www2.scedu.unibo.it/roversi/SocioNet/AdamicGlanceBlogWWW.pdf
    print(g.num_vertices(), g.num_edges())
    
    #reduce to only connected nodes
    g = gt.GraphView(g,vfilt=lambda v: (v.out_degree() > 0) and (v.in_degree() > 0) )
    g.purge_vertices()
    
    print(g.num_vertices(), g.num_edges())
    
    #use 1->Republican, 2->Democrat
    red_blue_map = {1:(1,0,0,1),0:(0,0,1,1)}
    plot_color = g.new_vertex_property('vector<double>')
    g.vertex_properties['plot_color'] = plot_color
    for v in g.vertices():
        plot_color[v] = red_blue_map[g.vertex_properties['value'][v]]
    
    #edge colors
    alpha=0.15
    edge_color = g.new_edge_property('vector<double>')
    g.edge_properties['edge_color']=edge_color
    for e in g.edges():
        if plot_color[e.source()] != plot_color[e.target()]:
            if plot_color[e.source()] == (0,0,1,1):
                #orange on dem -> rep
                edge_color[e] = (255.0/255.0, 102/255.0, 0/255.0, alpha)
            else:
                edge_color[e] = (102.0/255.0, 51/255.0, 153/255.0, alpha)            
        #red on rep-rep edges
        elif plot_color[e.source()] == (1,0,0,1):
            edge_color[e] = (1,0,0, alpha)
        #blue on dem-dem edges
        else:
            edge_color[e] = (0,0,1, alpha)
    
    state = gt.minimize_nested_blockmodel_dl(g, deg_corr=True)
    bstack = state.get_bstack()
    t = gt.get_hierarchy_tree(bstack)[0]
    tpos = pos = gt.radial_tree_layout(t, t.vertex(t.num_vertices() - 1), weighted=True)
    cts = gt.get_hierarchy_control_points(g, t, tpos)
    pos = g.own_property(tpos)
    b = bstack[0].vp["b"]
    
    #labels
    text_rot = g.new_vertex_property('double')
    g.vertex_properties['text_rot'] = text_rot
    for v in g.vertices():
        if pos[v][0] >0:
            text_rot[v] = math.atan(pos[v][1]/pos[v][0])
        else:
            text_rot[v] = math.pi + math.atan(pos[v][1]/pos[v][0])
    
    gt.graph_draw(g, pos=pos, vertex_fill_color=g.vertex_properties['plot_color'], 
                vertex_color=g.vertex_properties['plot_color'],
                edge_control_points=cts,
                vertex_size=10,
                vertex_text=g.vertex_properties['label'],
                vertex_text_rotation=g.vertex_properties['text_rot'],
                vertex_text_position=1,
                vertex_font_size=9,
                edge_color=g.edge_properties['edge_color'],
                vertex_anchor=0,
                bg_color=[0,0,0,1],
                output_size=[4024,4024],
                output='polblogs_blockmodel.png')
    
    0 讨论(0)
提交回复
热议问题