Read/Write NetworkX Graph Object

后端 未结 3 1062
北荒
北荒 2021-02-01 09:46

I am trying to deal with a super-massive NetworkX Graph object with hundreds of millions of nodes. I\'d like to be able to write it to file as to not consume all my computer mem

相关标签:
3条回答
  • If you've built this as a NetworkX graph, then it will already be in memory. For this large of a graph, my guess is you'll have to do something similar to what you suggested with separate files. But, instead of using separate files, I'd use a database to store each node with many-to-many connections between nodes. In other words you'd have a table of nodes, and a table of edges, then to query for the neighbors of a particular node you could just query for any edges that have that particular node on either end. This should be fast, though I'm not sure if you'll be able to take advantage of NetworkX's analysis functions without first building the whole network in memory.

    0 讨论(0)
  • 2021-02-01 10:17

    First try pickle; it's designed to serialize arbitrary objects.

    An example of creating a DiGraph and serializing to a file:

    import pickle
    import networkx as nx
    
    dg = nx.DiGraph()
    dg.add_edge('a','b')
    dg.add_edge('a','c')
    pickle.dump(dg, open('/tmp/graph.txt', 'w'))
    

    An example of loading a DiGraph from a file:

    import pickle
    import networkx as nx
    
    dg = pickle.load(open('/tmp/graph.txt'))
    print dg.edges()
    

    Output:

    [('a', 'c'), ('a', 'b')]
    

    If this isn't efficient enough, I would write your own routine to serialize:

    1. edges and
    2. nodes (in case a node is incident to no edges).

    Note that using list comprehensions when possible may be much more efficient (instead of standard for loops).

    If this is not efficient enough, I'd call a C++ routine from within Python: http://docs.python.org/extending/extending.html

    0 讨论(0)
  • 2021-02-01 10:25

    I forgot what problem I came to StackOverflow to solve originally, but I stumbled on this question and (nearly a decade too late!) can recommend Grand, a networkx-like library we wrote to solve exactly this problem:

    Before

    import networkx as nx
    
    g = nx.DiGraph()
    g.add_edge("A", "B")
    print(len(g.edges()))
    

    After

    import grand
    from grand.backends import SQLBackend # or choose another!
    
    g = grand.Graph(backend=SQLBackend())
    g.nx.add_edge("A", "B")
    print(len(g.nx.edges()))
    

    The API is the same as NetworkX, but the data live in SQL, DynamoDB, etc.

    0 讨论(0)
提交回复
热议问题